eleVRant: The problem of 3D spherical video

posted in: Camera Research | 0


eyeball_get_it_700In full spherical virtual reality video, there’s this idea that the eye sees a sphere, and so you create an entire sphere of video that simulates all possible directions the eye can be looking in, and this is like reality.

When you naturally move your eye around the pivot point of your neck, there’s a space of possible eye locations. The eye itself sweeps in a circle as you turn your head, and you can sweep by tilting your head up, which is also an arc of a circle. This space is roughly a sphere, so that meshes with the spherical video theory, right?

Not so much!

Do this quick experiment: close or cover one eye and look at something relatively close against a farther background. Turn your head very slightly and notice the parallax effect: the close object moves compared to the background. The eye is looking at the same thing, but from different locations on its sphere of possible locations. What it sees from different locations is, obviously, different. Just as obviously, a camera at each of these locations would capture different views, even on the parts of their views that overlap.

Now try this (without straining your eye too much): move your head and body around while keeping your eye still and focused on an object. If you’re actually not moving your eye, what you’re seeing will be the same; no parallax effect.


Once you’re used to that motion, try moving your head and body around a fixed eye location while staring straight ahead, changing the direction you’re looking in to see the full sphere it is possible to see from that point.

That is what a spherical camera captures. It’s theoretically easy in a technical sense: you have a bunch of cameras or one or a few wide-angled cameras and you capture incoming light from all directions. But the sphere that a theoretical perfect eye can see is different from the changing sphere of vision you’d see as you turn your head around.

The 360-degree spherical video that is captured around a point I call point 360. The 360 field of vision you see with one eye as you move your head I call natural 360. In natural 360, it’s not just the portion of the visual sphere that changes. Everything in your vision changes, in small but very important ways.

So now let’s talk about stereo vision. It’s extremely easy to capture stereo video as seen from a fixed direction: one camera for each eye, and the two cameras capture light the same way your eyes would. The slightly different views let you see depth information.

So how do you capture video that is both 360 and in stereo?


For a while I’d been thinking about it like this: point 360 is easy, and stereo is easy, but putting them together is the hard problem everyone is trying to solve. What I realized recently is that that was completely the wrong way of thinking about it. The hard problem is not how to create 360 and stereo, but how to create natural 360, the 360 you see as you turn your head, for just one single eye.

If you can do natural 360 for one eye, doing it in stereo is trivial, as both eyes have basically the same sphere of possible positions.

Try closing one eye and then the other, seeing how the parallax of a nearby object changes. If you look out of your right eye, and then turn your head so that your left eye is where your right eye was and look at the same location, your two different eyes should see the same view. 3D 360 isn’t a matter of setting up two different spheres of cameras, one for each eye; the same camera can function to show the right eye view when your head is turned relatively left, and the left eye view then the head is turned relatively right.

Someone with vision in only one eye can capture all the depth information that someone with vision in two eyes can see, it just takes a little longer. All you have to do is turn your head a bit. The experience of that information may be slightly different, but we create models of the space around us in the same way.

I think this is one of the reasons wavy handheld “documentary”-style video is popular as a way to make things seem more real. Functionally you have much more 3d information about the space when you simply move the camera a few inches, than with a non-moving 2-camera “3D” shot.

There’s no way to get that information when there’s just one fixed sphere of footage to choose from per eye, no matter how many cameras were used to create that sphere. There’s no possible camera rig, setup, or software that can output a normal static video or pair of videos that have that sort of actual stereoscopic spherical content. The amount of information you need (a section of a sphere for each possible eye position) would require storing all the video for all the cameras (theoretically this could be in a single video file, not that it’d be watchable in a standard player), and then the video playing software would track your head, see where the eye is, grab and interpolate the footage from the closest cameras, and stitch together the view that the eye sees, in real time.

This is certainly possible. You’d need a standard camera setup that the software knows how to deal with. Also the software would have to exist. And you’d need a really fast computer to be able to do this in real time fast enough to avoid simulation sickness.

There’s another possible way to get a true simulation of natural 360 stereo: if you’ve got a 3d model of your space, you can easily simulate what the eye sees from any location. Creating the 3d model ahead of time means less real-time work for your computer. That’s one reason that fancy 3d games are some of the first things popping up for virtual reality. Any game where you move around in 3d already has a way to render the view from one arbitrary point, so doing another for the other eye is trivial. A pre-recorded static video where you can’t even actually move seems like it should be simpler, but yeah, it’s not.

It’s remarkable that the brain can take flat (or flat spherical) camera footage and render it into 3d by putting multiple views together. 3d movies rely on this ability to capture 3d footage without the camera actually knowing any of the depth information. It’s like cheating. But once you can move your head, it may be easier to actually capture that depth information, build a point cloud by processing the different camera views or using a 3d scanner or Kinect or something, and create an actual virtual 3d model of the world that you can render different views of the same way a video game would, rather than film and play a video where the only 3d-rendering processing power is inside the human brain.

That’s all assuming we want true 360 3d. If all we want is to create a video experience that seems like 360 3d, the answer may be different!

We are currently experimenting with using multiple cameras in separately-stitched spheres for each eye, with some amount of stereo vision but no natural 360 for a single eye, no stereo if you tilt your head, etc. It could be that in the same way that the brain creates an effective 3d model of the world when you look out only one eye and move your head, to the extent that many people without stereoscopic vision don’t realize they’re missing anything, perhaps when you see a video that has some amount of stereoscopicness but lacking parallax information you won’t even notice the lack of parallax.


I have two strong predictions:

  1. Stereoscopic vision and/or head-tracked parallax will be an integral part of VR video. Modern flat cinematography is full of cameras slowly moving along tracks or flying around to help give us a sense of 3d space through parallax, and eventually seasoned VR viewers will be able to handle these large-scale camera motions without getting too sick, but VR is also an amazing platform for small-scale intimate video, for truly feeling a sense of being yourself, still in your own body and head, elsewhere, without motion. When you don’t have camera movements to give depth information, you’ve got to get it some other way.
  2. Flat video is going to be old fashioned before long. Video, or virtual reality video as we now call it, may not be fully spherical, but it will be fully immersive. Perhaps you face in one direction the entire time and can see in your entire field of vision, with no video behind or above you. Perhaps there is only stereoscopic vision for what’s right in front of you and not in your peripheral vision, just like in real life, and viewers know they’re not supposed to turn their head much, just as no one expects part of a movie to appear behind them in a theater. Our initial results in this style are promising and we expect we’ll have a demo to show you in a week or two.


Or perhaps VR video will always feel sickeningly unrealistic until we add every single detail right down to layering on a 3d digital reconstruction of your own nose. We don’t know! Nobody knows! But we’re working on it, and we’ll be sure to continue sharing our results along the way. Hooray for the existence of research groups.

In the mean time, check out the demos we currently have available!

There’s also a sense in which none of this matters. I’ve seen movies that I can recall many visual details from clearly, but not whether it was in 3d or not. I have clear visual memories of stories that I can’t remember whether I actually saw a movie of it or whether they’re images I constructed in my head while reading a book. There’s been times when a person flapped and vibrated their wet transient bacteria-covered meat in a way that vibrated the air in a way that my brain decoded into sounds, and then words, and then meanings, that have more real visual impact than most things actually experienced through my eyes.

It’s a well-known yet bizarre psychological fact that people regularly replace real memories of actual fully-imersive fully-3d life experiences with complete fabrications, and don’t know the difference. I’m certain that in the future people will think back on an experience of a story and be unsure whether they saw it in virtual reality or on flat video. So why is it so important to try to simulate reality as accurately as possible?

In the end I think it’s not really all that important, which is why I want it to exist as soon as possible, to be easy to create and view content without all these tech problems, and then we can get to the part where we start using it as a medium. After all, what makes a great medium great is that it’s not the medium itself that’s interesting, despite how new and exciting it is right now. Someday soon, the most interesting thing about VR will be what we use it for.