They begin as soon as you start filming. Cameras, just like eyes, are lenses that are looking at a 3D scene and projecting it onto a flat surface. The kind of projection that cameras and eyes do is called a perspective projection. A perspective projection seems natural to us because it’s the same projection that our eyes use, but just like every projection from 3 dimensions to 2, information has to be lost.
Some of the information is stuff that we are consciously aware of losing (what’s behind that wall?), but much of it is surprising. Our brains are amazing at inferring and reconstructing a 3 dimensional world from the flat perspective projections because that’s the kind of information that they get from our eyes. To really get a sense of how much reconstruction our brains does to the perspective projection of the world to compensate for what we lose from a perspective projection seen from a single viewpoint, it can help to look at optical illusions. Here are just three examples of optical illusions that hint at how our brain reconstructs a 3D scene from a perspective projection.
While we can’t know for sure what’s behind a wall, our brains do make inferences of connectivity and consistency to infer the full shape of things that are partially obscured. When these two figures are obscured, the ‘natural’ underlying shape that our brain fills in is of two straight bars, even though the underlying shapes are actually bent.
Similarly, we find it difficult to believe that these two cats are the same size because our brain uses the perspective lines to infer depth cues, and we know that a cat the same size would look smaller at that depth. Even though we logically know that we are looking at a flat image, we can’t help but perceive this image as showing a perspective projection of 3-dimensional space with two differently sized kitties.
Finally, I highly recommend watching Kokichi Sugihara‘s videos of “Impossible Motion” to see some amazing and mind bending examples of how our brain tries to reconstruct the world and of what sorts of information are lost with perspective projection.
If we were just making a single flat video then the perspective projection of the camera would be the only projection that we’d have to worry about, but for VR video the perspective projection generated by each camera is destined to be warped even more.
First, we need to get a full 360 degree view of the world. There don’t exist cameras that can film in every direction at once, so we’re going to achieve this by filming with a lot of cameras. Each camera is capturing a flat image of the world, and the light going through that world, some distance from the center point of all of the cameras. All of these images need to be stitched together to create a panoramic sphere of video.
If you’re thinking that it seems like it should take an infinite number of infinitely small flat faces to get a perfect sphere, then you are absolutely correct. Our spherical videos aren’t really representing the true view around a point. We’re stitching together a bunch of videos to create a large, roughly spherical image with a lot of flat parts, which we, of course, are going to project onto a sphere.
These inaccurate 360 panoramas let us take advantage of panoramic twist to create two different 360 panoramas around a single point and generate an impression of stereo video. But both of those panoramas are actually incorrect, there is only one accurate panorama around any give point. If we had a camera that let us capture exactly that panorama, we wouldn’t have a second panorama to show the other eye and trick our brain into perceiving as stereo.
Because of our panoramic twist trick, our projection of a stitched many sided polyhedron onto a sphere is actually a win for us. But you might wonder if it would be an advantage to not have to do this messy stitching and projecting onto a sphere step. Wouldn’t it be nice to have a real 360 camera? Certainly, I don’t think any of us would be distraught about not having to use loads of cameras and muddle around with imperfect stitching software.
But a 360 camera wouldn’t fix my projection problem. All of our films and formats and file types are for storing flat video information. This means that even if a single camera could record spherical video all the way around it, it would still have to make the world flat to store that information.
There do exist extremely wide fisheye lenses that can capture 180 degrees of video around a point. The light that enters these lenses is a true hemisphere of light, which we promptly mush flat and distort. If you’ve ever seen pictures from a fisheye camera, you probably already have some idea of how their lenses project the images that they capture onto a flat surface. If you haven’t, go check out some real estate listings. Realtors love fish-eye lenses because they make rooms look bigger.
The projections that we have to use to make spherical video flat cause the most eleVRanting for us. The spherical panoramas that we generate by stitching together several videos face the same problems as our hypothetical 360 cameras of needing to be stored in formats that don’t understand spheres. If spherical video becomes a more popular, perhaps we might start seeing video information being encoded in entirely different spherical formats, but, for now, we are storing our our spherical videos flat.
Everyone who has compared a map to a globe or tried to flatten an orange peel has some real intuition for the fact that there is no way to flatten a sphere without distortion. The question for us then becomes: What is the best projection from a sphere to a plane to use for storing and playing our video?
There are a lots of projections to choose from. The one that we have used for our videos thus far is the equirectangular projection. This projection makes lines of longitude of our sphere into parallel lines of constant spacing. The lines of latitude have the same constant spacing, creating “equirectangular” squares between them. We can think of this projection as cutting a hole in the top and the bottom of a sphere, stretching it into a cylinder while keeping the diameter the same and then slicing it and rolling it flat. Emily talks more about the specifics of this projection here.
This projection has massive distortion at the poles, as well as way more information there. In a sufficiently high quality video, this shouldn’t make a difference, but in a lower quality one, like the ones that we package with our player, you can really see that the top and bottom of the world (the ‘poles’) are way clearer than the equator.
The equirectangular projection seems to be the standard that most current panoramic video players accept and that most tools for creating panoramic video output. It also looks fairly nice in it’s flat form as nearly everything is properly connected. However, I really hope that this does not become the long-term standard. Not only does it have severe singularities and not distribute the video data very evenly, but it’s also fairly computationally intensive to turn it back into a sphere. In particular, it is necessary to calculate two arctangents for every pixel in order to get the correct color off of the video texture and onto the projected video. It’s incredibly important to get a high frame-rate and low lag when doing virtual reality; needing to perform the fairly expensive arctangent operation so many times per frame really hurts our ability to hit our desired frame rate. Finally, because the equirectangular projection is so distorted, it’s particularly difficult to edit the videos in their flat forms and no software currently exists for enabling it to be edited as a sphere, although we really wish that someone would make some plug-ins for that.
As you can probably tell, I’m not a big fan of the equirectangular projection. So what do I think would be better? My personal preference would be for us to use the cube projection, which is the easiest projection of the sphere onto a regular polyhedron. Just squish the sphere onto a cube, then unfold the cube into six squares. You can even rearrange the squares to fill a rectangle so that there isn’t random blank space in the flat video that you are storing.
Turning the cube projection back into a ‘sphere’ is easy – just put your square faces together into a cube. Cube mapping is already a standard way of storing graphical environment information for games and panoramas. It is generally preferred because it is far simpler and more computationally efficient. Modern GPUs are designed to be good at cube mapping because this technique is so standard. And, in case you think that putting the faces into a cube rather than something truly spherical might look unconvincing, here is an example cubical ‘skybox’ panorama.
While it’s impossible to have the spherical video information be completely evenly distributed across a flat rectangle, the cube map stores the graphical information much more evenly and with less distortion at any point, so the resulting panoramas actually tend to look overall a bit better than equirectangular panoramas in my opinion. They don’t look as good in their flattened form, but that’s not really how you should be viewing spherical VR video anyways. Less distortion also means that videos stored as a cube map may be easier to edit.
Of course, there are a great many other possible projections. We could, for example, get less distortion by using an octahedron, icosahedron, buckyball, or really most other spherical polyhedral mappings, but this would be relatively little gain at the cost of being less computationally efficient to display and more difficult to store efficiently in a rectangular format. There are also more exotic projections that haven’t been explored much at this point. Short of actually creating a video format specifically for VR video, I believe that the cube projection is the most sensible.
Let’s summarize where we are so far. First, we took our real 3D world and used perspective projection to turn it into a large number of flat 2D videos. Next, we took those 2D videos and stitched them together into a spherical panorama, where, if you stood in the middle and looked around, it should look fairly similar to how it would look to stand in the real location that we were filming and looked around. We’d love to have stopped there, but we don’t have any way of storing a sphere of movie in playable data, so we have to turn our spherical panorama into something that we can store flat in a video file. Finally, our player takes the projected video and inverts the projection, then shows you part of the world that you can actually see – your “field of view”.
In conclusion, I’m definitely projecting…