Intel® RealSense™ Technology – Opening New Doors for Filmmakers
Volumetric Capture using Intel RealSense Depth Cameras, By Suzanne Leibrick, Intel® RealSense™ Experience Design Manager
What’s your signature dance move?
I recently had the privilege of attending Sundance Film Festival in Park City Utah. As a part of the Intel Tech Lodge at Sundance, the Intel® RealSense™ group presented a volumetric capture experience. With the recent launch of Intel Studios – a 10,000 square foot space in Manhattan Beach, Los Angeles, Intel has firmly said “Volumetric capture? We’re here for it.”
At Sundance, what we wanted to show is that not only do we have this amazing, giant studio space, but that we can bring Volumetric Capture to the indie creators, to smaller studios on a tight budget. We did this using the Intel RealSense D400 series depth cameras. These cameras not only capture RGB, or color data, in the same way a traditional camera does, but capture depth information too. For every pixel in a space, we know how far from the camera it is. We know that the wall is further back than a person. That enabled us to capture over 500 people sharing their signature dance moves with us, in full three dimensions, by using 4 cameras synchronized together.
What is volumetric capture?
Traditional film captures an entire scene from one point of view – that of the camera. Once a film or photograph has been taken, you cannot go back, change where the camera was, change lighting, or easily add or remove objects or people, without expensive re-shoots, or visual effects work. When talking about live action virtual reality, you also cannot move around within that scene – you are limited to being able to turn your head and see every angle from the position of the camera. With volumetric capture, instead of using a single traditional camera, different methods use up to thousands of cameras to capture a scene from every angle. Every object in a space is captured in 3D. Because the scene has been captured in this way, because we have all this 3D data, we can represent the entire space digitally as volumetric pixels or voxels.
Once we have all this captured data, we can then do many interesting things – we can show the scene from the perspective of any player within it. We can re-light the scene, or easily remove a problematic object. We can add digital creatures and assets, place new backgrounds or simply change what camera shot we prefer. In virtual reality, we can allow people to freely move around a space, giving them the ability to really experience a story or event in the way they want to, without being restricted by the film maker. This freedom of movement, and freedom to change content after it’s been filmed is very exciting to me as both a creator of content and viewer of it.
Why is volumetric capture interesting?
Over a hundred years ago, the first films were being made – the language of film was being developed, and it was a very exciting time for creators. With the increasing accessibility of volumetric capture, we are once more in a new age of storytelling. From books, to radio, to film and television, every new technology has changed how we tell stories and see the world, and volumetric capture stands poised to do the same. Volumetric capture has something in common with more complex story driven games than it does with film – it is story, driven by a collaboration between the creators of content, and the viewer. If I can choose what I look at, when, I can experience the story in a way that I choose. Picture a scene, a coffee shop. There are people at every table. I can watch the older woman next to me, mysteriously tracing letters from book covers. She has a notebook on the table, stuffed with papers and carefully marked tabs. I can wonder what she’s doing, and why. Or I can join the conversation on the other side of me – the people working hard to figure out something important. I could choose to leave the café, follow the dog walker down the street outside. These could be stories of reality, but carefully curated by the creator to show us something new about the lived experiences of others.
I view the future of volumetric film as creations we can experience as entirely new realities. The Star Trek where you can watch the action from every position on the bridge, and join the team on away missions. The epic war movies where you stand on the battlefield. As humans, we have always imagined different worlds than our own. We could read about each other’s worlds, we could show them, but we could not walk within them, except in our dreams.
That’s about to change.
Q. Which Depth Camera did you use?
We used the Intel® RealSense™ D435 Depth camera. This camera has a wide field of view, enabling us to capture all the amazing dance moves. This camera’s depth sensor also has Global Shutter – which means all the pixels in a frame are captured simultaneously, rather than using Rolling Shutter, which captures pixels in a continuous stream – which can lead to artifacts when you are recording something that moves very quickly like dancers.
Q. How many cameras did you use? What was the setup?
We used 4 cameras, spaced equally on a shallow arc at a height of 4 feet from the ground. With this we were able to capture 180° around the participants. Vertically, this captured people roughly from knee height to well above their heads. Each camera was connected to an Intel i7 powered PC to record the data, which was then all transferred automatically to a 5th PC to sync it all together. We also used a commercial grade flash unit to signal the start and end of a capture, allowing us to use that spike in data to make sure our camera footage was properly synchronized together. We also used hardware sync cables (for more details see this whitepaper – NEED link) to make sure the frames were captured simultaneously.
Q. How many cameras would I need for a full 360° volumetric capture?
A lot depends on exactly what you want to capture. Each camera has 85.2° x 58° (+/- 3°) field of view – that’s horizontal x vertical. In order to get a capture without holes in the data, you will want to make sure that none of the areas you want to capture will be occluded from the camera. If the camera can’t ‘see’ a particular place because it’s behind something else, it can’t capture that data. If you look at some of the captures we took, you can see since we were only using 4 cameras, at times things like hands and arms would occlude the person behind the hand. While our depth cameras are great, they can’t see through objects. For a full 360° you could start with 8 cameras in a ring around the participants to test, and then add more if you see occlusion in particular places.
Q. Why is there so much noise around the edges, and why are the captures so grainy?
What we’re capturing isn’t film as you would normally think of it. We’re capturing what’s known as a point cloud – a set of positional data for every voxel in our scene. Each point also has color data associated with it. While we could convert this data into a mesh of polygons and get a smoother model, inherent to the capture method will always be some noise. The noise around the edges is mostly because we were not using a full array of cameras to surround people – there’s a degree of uncertainty in each voxel, so the more cameras we have capturing a particular area, the higher our confidence will be that that voxel is correctly placed. The noise around the edges just reflects some stray, lower confidence points. With more cameras, or more time spent on post processing and refining the captures, we could eliminate the noise for a more refined output.
Q. Can I use this capture system outdoors? On location? Are there colors that it can’t capture well? Don’t the cameras interfere with each other?
Many methods of capturing depth exist, and they all have different advantages and disadvantages. Some capture methods have a hard time working in bright sunlight, since there’s too much infrared light with the sun around. They can also have issues with other infrared sources, like other depth cameras. Some depth capture systems require complicated setups with many cameras, so they’re difficult to use on location, or in live performances. Some have a hard time capturing the color black, or capturing dark hair properly – often very dark colored objects appear as simply a void where there is no depth information. Our stereo depth cameras don’t suffer from any of these issues – for us, all infrared radiation is good information about our scene, which also helps us work well with dark colored objects. Since our cameras are small, they’re also very portable and easy to set up on location.
Q. What software did you use?
We used a modified version of the Intel RealSense Viewer software – all our software is open source and easily modifiable for your own use. We also used Unity, a game engine, for some post processing work. By putting our data into Unity, it was very easy for us to tweak the recordings, for example by using a bounding box around the important areas of action, which allowed us to ignore any points outside of that. Since some people had very bold and dramatic moves, this made it easy to customize our output. We then took the resulting captures and shared them using Sketchfab, which has great playback capabilities built right into the browser and social media platforms. We could also then allow people to download their own models if they wanted to.
Q. How big are the files? What do you do with all that data once you have it?
Over the course of 300+ captures of around 5 seconds each, we captured 875 Gigabytes of data. As to what you might do with your captures once they’ve been created, really the sky is the limit. Because of how the data is captured, you can modify it easily – adding custom textures and shaders, add it to different environments, put it in a VR experience, allow people to directly interact with the models in interesting ways, or use it in a 2D film where you can move the models around. We’ve created the technology, now we need you to envision new ways to use it as creators.
Learn more about Intel RealSense D400 series depth cameras.
You may also be interested in
In a three-dimensional world, we still spend much of our time creating and consuming two-dimensional content. Most of the screens
A huge variety of package shapes, sizes, weights and colors pass through today’s e-commerce fulfilment or warehouse distribution centers. Using
Let’s talk about how Intel RealSense computer vision products can enhance your solution.