A while back, Andrea did a great post about why cube maps are clearly superior to the gluttonous pixel-wasting mess that is the equirectangular projection, but I have a new claim: Projections are like the despotic rulers of a bygone era, powerful, all consuming, and totally pointless. They, like all dictators in movies where the good guys win, will inevitably be overthrown by the revolution of light fields. Let’s dethrone the monarch first, then I’ll write you a manifesto to rival the masters: a Charge of the Light Field Brigade. There will be moving music and everything.
We are screen biased. We are used to getting our digital world from a forward-facing pixel-dense discrete surface, and the effects of that display technology run deep. It biases the construction of all digital content, all playback paradigms, all compression formats, all development environments, all operating systems, everything.
For example, say you stick a bunch of cameras together and make a natively spherical video, and say that video has all the information you need to get some fancy stereo working in your HMD. Thats great, congratulations, no seriously, good work. Unfortunately, the next step: saving your video so you can look at it, or store it, or edit it in any way, requires you smoosh your fancy spherical video in to some arbitrary rectangle. At the moment you are probably stuck with some 1 by 2 proportion, but the layout of the rectangle hardly matters. After all, your screens are flat and your storage format is rectangular and your compression standard is a lazy 2-dimensionally-biased jerk that’s only there as a stopgap. Projection mapping is just an intermediary step while we are stuck working mostly on screens instead of in VR/AR.
Don’t get me wrong, theres no need to break out an .obj file for video, a pile of discrete triangles. You don’t need to store mountains of 3D information for every frame, just stop trying to put a round peg in a square hole. Flat screens were a bootstrapping technology necessary to interface with the digital until we could bring it out of the metal boxes and glass-fronted handhelds into the world with us. But if we’re not careful, that history of rectilinearity will bias off-screen digital content in the most seemingly innocuous of crevices. Take the pixel:
Before you get all: “HMDs all use LCD screens with square pixels, and in fact so do all projection based AR systems, so your wrong, so there,” take a breather and let me explain.
This is a storage issue, not a playback issue. No matter what we talk about here, there will always have to be a conversion step for your favorite viewing technology. A physical display-type pixel currently looks like a glowy red thing really close to a glowy green thing which is really close to a glowy blue thing. Change the brightness of any one of the three and voila, we see colors. This kind of light-emitting pixel isn’t going away anything soon.
But the kind of pixel we use to measure digital images: PPI, 1080P, and 4K etc, are getting pretty frustrating in VR video.
I’m not saying we need to fundamentally change the relationship between pixels in a raster image, essentially just hex color values grouped by your favorite compression, but instead rethink the way VR/AR content is stored. Keeping everything spherical would mean you never resort to flattening. Instead of reusing old formats, spherical-based compression could be optimized for storing the two data points required for spherical coordinates (polar and azimuthal), rather than stretching at the poles and splurging my limited pixel budget in the worst possible places.
Just this, just eliminating this waste, plus optimizing compression for stereo by educating it on redundancies in the overlapping eyes, increases the quality of stereo spherical video and also improves playback. Sure actual export and compression times might increase, but I am all for an upfront investment of time if we get less wasted pixels and fewer distortion passes and thus higher playback performance as a result. It’s a good old time/space tradeoff
What should we call this native storage format? Perhaps the .ssv or the .s3d or, and this is my personal favorite, the .FLMS (frankly lazy middle step). That’s right, you heard me: a frankly lazy middle step. This whole compressing and storage discussion is a little near-future for my tastes. Time for that overthrow I promised you.
A few years out and we can ditch spherical surfaces that need to be awkwardly subdivided into healpix or sphixels (so that each pixel covers the same surface area), and replace it with volumes of light. One way, the way most people are used to thinking about 3D environments on computers, is a wiremesh stage-dressed with geometric primitives coated with different materials and a bunch of lights. But that’s only one way to think about it.
Yes, yes, I hear some of you shouting from the rafters with your ghostly wails, “Liiiight feildsss *spooky noise* Talk about Liiiiight feeelids.” OK, ok:
For those that haven’t encountered these before, the concept of light as a field was first theorized by Michael Faraday in 1846 after years of working with magnetic fields, but the first time anyone called it a “light field” was in a 1936 paper by Arun Gershun on radiometric properties of light in three-dimensional space. Don’t stop reading just because that sounds complicated. It’s not that fancy. The basic idea is that light can be measured in every direction from a central point. Think of it like this:
In the above image, iron filings act as a spherical sensor which follows rays of magnetism from a central point. The vectors of the field are captured as well as the foot-candle-esque drop-off in intensity over distance from the central point. The digital life of light fields started way back in 1996, because they, like most things in VR, were theorized by researchers well before mobile screen technology matured enough to give us the snappy HMDs of the current VR revolution.
My favorite of paper of the period was written by Marc Levoy and Pat Hanrahan of the Computer Science Department at Stanford University on Light Field Rendering. I’m gonna quote this paper a lot, so maybe just go read it.
Light field rendering is a form of image-based rendering which is just a method that uses 2D images of a scene to both whip up a 3D model of that scene and allow for the creation of new views, no modeling required. A one-directionally-captured light field (currently the only kind that exists) looks a bit like this:
In the top image the plane V looks out much like the human eye, with a cone of visible light converging on one point, creating an array (what would be in humans a 2-image array) of more or less the entire scene. But this is problematic as to change the view point would require, to quote Levoy and Hanrahan here, “a depth value for each pixel in the environment map, which is [only] easily provided if the environment maps are synthetic images.”
(It is usually at this point in the diatribe that people start with the hot-gluing-kinects-to-my-camera talk, but we will leave that tangent for another post.)
Light field capture and rendering saves us from this segregation of types. Light field rendered images and light field captured images can be effortlessly mixed, and with a bit of simple linear reprocessing: any view, any head position, any tilt or angle can be rendered by the inept-est of graphics cards.
So that is my now not-so-secret goal for VR: that VR will become a medium of space. Not of video or games or the web, but of spaces created with both capture and generated images, moving and still, seamlessly integrated. And it looks like spherical light field cameras are the way to do it. Viva la revolution.