Previously, we’ve discussed mono camera balls and stereo polygons. Today’s topic is camera circles, usually a set of eight or more cameras placed symmetrically and directly outward-facing in a circle, that are intended to capture stereo 360 video (possibly with the addition of upward/downward facing cameras for mono completion of the sphere).
The conclusion will be that camera circles can work for stereo if you do it right, but it’s probably better to arrange those same cameras in a stereo polygon instead.
If you want to see in stereo, you’re going to have to have different views for each eye, which means the views you take for each eye can’t both be outward-facing from the center of a camera ball, but that doesn’t mean the cameras themselves can’t be outward-facing.
For stereo vision you need parallel views an inter-pupillary distance apart. But you don’t need actual physical pairs of cameras one IPD across, with one set of cameras for left and one for right (as in stereo polygons). There’s other setups that contain the necessary views, as long as you take care to use the footage correctly, and a circle of cameras can do it as long as the field of view of the cameras is enough to double-cover the scene.
Whether the camera itself faces outward and you take a section of footage centered at an angle, or whether the camera faces on an angle and you take the center of the footage, that doesn’t really matter in a mathematical sense.
If you have eight cameras in a circle, you can point them all outward and then use each camera twice, taking the footage facing to the right for the left eye and the footage on the left side for the right eye. It’s equivalent to having 16 cameras with a smaller field of view that are set up in the stereo polygon configuration.
I’ve talked about panoramic twist as a mathematical thing before, but when it comes to physical camera setups, I like to think of these 16 footage slices as virtual cameras. This is especially useful for considering resolution and field of view. One lens with a 120-degree FoV might become two virtual cameras covering 60 degrees each, or 45 degrees each, or something else, depending on your setup.
The goal is to have right eye and left eye virtual cameras be centered on angles such that the views are parallel one inter-pupillary distance across on your camera setup (meaning they’d look like standard stereo pairs), and if the size and field of view isn’t right this might mean leaving out the center section of the footage, which would be inefficient, or having overlap between the right and left eye sections, which would cause stripes of mono in your final product.
While eight cameras divided into 16 virtual cameras does work, at the moment I don’t think it’s better in practice than eight cameras in a pair square. Eight outward-facing gopros means eight pieces of footage stitched together per eye, and that’s eight stitching lines per eye, which means not a lot of room for a person to be filmed without them getting a stitching line through their face. While each view is a bit more accurate, I have yet to see the kind of automated stitching that would make it worth it.
In theory, with great algorithms that can smooth over stitching errors by themselves, you’d want as many cameras as possible, leading to a huge amount of needed stitching, but each stitching error would be tiny and fixable algorithmically. This is what I like to imagine companies like JauntVR are going to work towards someday.
JauntVR is one of the only companies serious about trying to figure out stereo video in a simultaneously-captured sphere (or sphere minus bottom, in their case), and they have the best results I’ve seen from anyone with a camera circle setup, so I’m gonna talk about them a bit.
Right now, it looks like Jaunt’s camera setup has 12 gopros around, all angled directly outward (plus two on top for a mono ceiling).
Jaunt is known for frequently saying that they do not use stereo pairs of cameras, that they instead use many outward-facing cameras and then use 3D algorithms. But whatever they brand it as, the end result is a stereo pair of spherical videos that stitches together right and left eye views, if not from stereo pairs of literal cameras, then from stereo pairs of virtual cameras, and personally I think that’s plenty exciting without the extra mystique.
(Jaunt has never divulged their secret algorithms to me, which leaves me free to speculate. Many of their videos are available from their website and it’s fairly easy to see the ring of 12 stitch lines per eye. Whether their stitching of 24 views happened as a result of standard panoramic twist techniques to get virtual stereo pairs, or whether they got a similar-looking result from some more complicated process involving algorithms, I cannot say.)
It’s fun to imagine what one might get if one took those 12 cameras, split them into two sets of six, stitched together two panoramas, and hoped for the best. There’s left and right eye cameras spaced apart, so every point in the scene has different views for left and right eyes, which is what you need for stereo. But there’s no panoramic twist…So would it work? What would happen?
The proof of failure lies in one word: symmetry. When you split a circle of outward facing cameras into two sets, which set is the left eye and which is the right? Both sets are the same under rotational symmetry, completely achiral, so any choice is arbitrary.
But we know in actual views of the world it matters which eye is left and which is right. The stereo polygon setup breaks that symmetry, yielding a chiral pair that tells you which eye is which.
The actual effect of splitting an outward-facing camera circle into two sets, rather than splitting the footage and using panoramic twist techniques, has the stereo effect warp back and forth between muted and reversed. Every point around the camera is seen in stereo, but half the time the eyes are reversed.
We know from experience that subtle wrongness in stereo, from vertical shift to reversed eyes, can easily trick the brain into thinking it’s pretty good, not perfect but not fundamentally broken. But often in VR a little math can show that what you thought only needed a tiny tweak is actually fundamentally wrong!
Anyway, back to doing things the right way, with panoramic twist.
The stitching errors should in theory each be smaller the more cameras you have and thus easier to fix algorithmically, especially if you use all that overlap information to calculate stitching distance (which I hope someday will be completely automated by stitching software; it’s technically feasible but no one’s done it as far as I know). But the direct relationship between more cameras and smaller errors only exists if you’re adding more cameras to a camera ball of constant radius, not if adding cameras means making your camera ball bigger. 12 gopros around makes for a big radius, and the further from center the cameras are, the greater the total stitching error.
For purely theoretical realistic stereo capture purposes with ideal tiny high-res wide FoV cameras, there’s no advantage to having a camera ball much bigger than one eye-width in diameter. But in the current landscape where gopros are the best thing on the market for these purposes, you might be willing to sacrifice some radius size if it means you can get more gopros in there and then use a narrower field of view to get higher resolution.
JauntVR’s overly large radius in particular works for them right now not only because they’re using gopros and need that extra resolution, but also the greater radius leaves room to hide an entire cameraman in the blank/branded area below their camera (I don’t think hiding cameramen in your nadir is the future of VR film, but there’s something beautiful about it nonetheless), and because they currently produce videos in hyper-stereo, which will never be standard but currently serves well to immediately signify to new viewers that they have entered the surreal world of 360 stereo video.
A multi-outward-facing camera setup, with enough overlap, would in theory allow the producer to change the amount of panoramic twist to be more or less stereo-y in post-production. Moving hand-held shots, which have built-in parallax information, benefit from the easier stitching of low-twist shots, transitioning to deeper stereo and realism when the movement stops and focuses on an actor’s face, for example, and even to hyper-stereo when you want that surreal feel. For gopros right now, filming at a wider angle allows post-production twist flexibility, but at the cost of resolution, and I don’t know that anyone right now has the software or the time to want to tweak twist in post, but something to think about.
While I can understand wanting a camera setup that allows for more horizontal stereoscopy than is realistic, there’s no reason to give stereo camera balls more vertical distance between cameras than is absolutely necessary. Cameras look cool when arranged in a ball, and for mono spherical you’d want them evenly spaced as closely together as possible, but if you absolutely must have more than one camera to cover the vertical field of view they should be as close as possible. Assuming you’re rendering stereo video meant for a viewer looking around with their head level, there should be as little vertical parallax as possible.
Jaunt smartly abandoned their first camera ball prototype where vertical tilt cameras were spaced apart and not aligned with the center row, in favor of a single disc of cameras with a wide vertical field of view. I’ve seen many camera balls with the kind of camera placement that would be terrible for stereo, but I really only know for Jaunt that its intended use was for stereo. I’ve seen other camera balls where the middle rows of cameras were vertically aligned, and those would work for stereo if the field of view is wide enough. Worst case the horizontal stitching errors are as bad as the vertical ones, but the vertical stitching errors get better and better the more you flatten your setup.
Of course if you’re gonna do things right and make your vertical cameras close together, you might as well go all the way and also put them in a stereo polygon. I’m delighted by the design of the Bivrost, which as far as I can tell has 20 cameras in a stereo pentagon, vertical field of view covered by two cameras as close together as possible. I don’t think any physical prototypes have been made yet, but at least whoever’s marketing it is marketing something that could potentially do what they’re claiming it does, so, that’s already better than a lot of VR companies.
And is pretty 😀
But in the end, if one really must capture an entire sphere of high resolution really well-produced film, the best way to avoid the many stitching errors of the camera circle is to do it the way actual cinema is filmed today: with great cameras, a huge crew, lighting, sound, and done almost all piece by piece to be composited later.
Whether your footage ultimately gets composited into a rectangle or a sphere, many of the techniques currently used by the film industry will transfer right over. Actors increasingly act alone in minimal environments. Many major motion pictures are almost entirely, if not actually entirely, digital. Film production is ready for VR as soon as we get better spherical compositing tools, and then stitching errors can be avoided entirely.
Especially when it comes to the future of VR cinematography, if we want a close-up shot where we’re on a desk slowly wandering through a giant landscape of crumpled up discarded poetry for the opening scene of our film about writer’s block or whatever, torn and scribbled words towering over us to make us feel the helpless inferiority of the protagonist who comes into view cheek down on the desk, camera finally coming to a rest just out of range of her quivering left eyelash, cue monologue, slowly pan out by growing bigger and leave through the ceiling to find we’re only looking down into a toy universe where our poet’s angst seems exactly as existentially bereft of importance as she thinks it is, we can’t shrink our camera ball to fit between the pages. It’s going to be rendered and composited just like so much of today’s visuals.
(Yeah technology tends to get real small, but given the physics of lenses and light I don’t think we’ll have a quality pea-sized spherical camera anytime soon. Though I like to imagine tiny servos expanding the tiny camera ball to make it hyper-stereo as a robotic arm moves it through the ceiling…)
Good design of spherical cameras isn’t essential for VR film to have a future, but it IS essential for live captured events, streaming, vlogs, and consumer use. Next time, we’ll talk about minimal arrays and hybrid stereo for consumer applications.
Homework question for 3d modelers: when you render out views from highly realistic 3d modeling or architecture software, can you input any vector field to collect virtual light from? I imagine most default to orthogonal projection, but architecture loves wide-angle views, and I don’t know how much work it would be to input the two vector fields for left/right panoramic twist and get out a stereo spherical pair of stitch-free views that you could then use as background for greenscreening etc, or if these sorts of softwares let you write your own raycasting thing that you could use with it, or what.