While many people are familiar with Intel® RealSense™ technology as a producer of high-quality depth cameras, the introduction of the Intel® RealSense™ Tracking Camera T265 has created a lot of questions about the difference between our tracking and depth solutions, and the applications for both. With a long history of leadership in the depth camera space, Intel has produced many different types of depth devices, currently featuring both Stereo and Coded Light solutions which are used in a wide variety of applications.
The depth and tracking cameras are sold separately and have many uses individually, however there are also a significant number of spaces where combining depth information with an accurate V-SLAM tracking solution can be beneficial. In this post, we will explore a few of those use cases and why adding tracking to depth leads to a better-together result.
In some ways it is useful to consider tracking and stereo depth as different versions of the same problem space – given a set of known information, both systems use algorithms to determine unknown information. In the case of a stereo depth camera, the known information is the baseline between the two camera lenses. By comparing the image each camera lens sees, and looking at individual points of interest, the depth distance of each pixel in the space (and therefore the distance of each object) can be known essentially by drawing a triangle between the point in space and the two cameras and using trigonometry to infer the distance.
Tracking, on the other hand, is concerned not with knowing how far away a particular object is, but rather, with understanding the position and movement of the tracking camera. The T265 doesn’t track external objects, its job is to track itself with a high degree of accuracy. It does that by using its own set of known information. It has two cameras, which have their own known baseline. It also contains an inertial measurement unit (or IMU), which is a combination of gyroscopes and accelerometers similar to that contained in most modern cellphones. The cameras search for visually distinct features in the environment, compare them between the two cameras and then combine that information with the motion of the device from the IMU to enable highly accurate and low latency position data.
For robotics, one of the key problems is knowing where the robot is, and also being able to adapt to a changing environment, as well as being able to understand the whole of the environment it is in. A security robot that ends up in an ornamental fountain is no longer performing its safety functions after all. A robot equipped with the T265 tracking camera can integrate wheel odometry data for a very accurate understanding of its position within the environment, but would struggle to avoid the pond without virtual fencing, and would not be able to avoid a person walking in front of the robot. By adding a D435 or D415 depth camera, the robot can understand obstacles in real-time with no need to pre-map the space.
For example, if the security robot in our example has identified something it needs to navigate towards – an object of interest, the data from the T265 can be used to plot a path towards this object of interest. Even if obstacles are placed in the way of the robot, the D435 or D415 can identify the obstacle, allowing the robot to navigate around it. In the video below, we throw obstacles in the path of the robot. Even though these are random, the robot can accurately assess and move around it before returning to a path that will take it to target. Combining the two devices together gives a robot a broader understanding of the space that it is in, and allow it to create an occupancy map of the environment and navigate through it with ease. The depth camera (providing it is oriented correctly) could even identify the downward slope or steps towards the fountain and avoid those hazards too.
In XR applications (i.e. VR, AR or MR), one of the most important technical challenges is tracking, or understanding exactly where a headset or other viewing device such as tablet is moving in space. In the case of head mounted devices, a discrepancy between the motion of your head and the corresponding content you view can lead very quickly to severe motion sickness. Latency under 20ms between motion and reflection of that motion in the head pose model of the device is absolutely critical. The tracking camera T265 was designed to provide tracking with latency under 6ms – some additional latency will be added when then rendering and displaying the appropriate frames to the viewer.
For that same device to understand and allow interaction between real and digital elements of the environment, a depth camera is required. In virtual reality, since the entire field of view of the user is obstructed, the addition of a depth camera can improve safety, allowing for automatic guardian or sentinel boundaries. Pets or people moving in front of the player can also be easily identified and avoided since the depth camera can be set to flag anything within a certain distance of the headset.
In augmented or mixed reality where the real world is either visible or displayed via pass-through cameras from the headset, integration of real and digital objects becomes more crucial for maintaining immersion. A digital character needs to be able to stand on a real table, and correctly disappear behind walls or foreground objects. The combination of T265 and a D400 depth camera allows digital objects to be placed “on” a real surface and stay exactly where they were put. Even if the user looks away from the object, when they look back, it will not have drifted, since the environmental understanding from the depth camera combines with the headset motion data from the T265. A depth camera can also be used to improve hand tracking, giving the user the power to manipulate digital objects with their hands.
We have previously explored a variety of 3D scanning applications and use cases in a few different blog posts, but one of the things we have not discussed at length is the challenges associated with full 3d reconstruction. If you are attempting to build a 3D model of an object or scene using just a single camera, inevitably, either the camera or the object have to move or be moved in order to create a full 3D scan of an object. If you have ever tried to take a panoramic photo, you may be familiar with the need to move slowly enough so that each photograph you take will line up with the previous one. When you are 3D scanning an object, the process is similar. By adding a T265 tracking camera to the system, the position and orientation from the tracking camera can be used to help the software correctly align hundreds of frames of depth data together for a complete scan of an object or space.
By improving the accuracy of these scans in real time, new use cases for 3D scanning are opened up, from allowing Real Estate agents to scan properties with a tablet and then display them to prospective buyers with ease, to allowing construction project managers to inspect and document projects on a daily basis with never before seen completeness.
Most drones currently use a combination of GPS and IMU data to track their position. There are situations where this may not provide enough information, however. Drones flying in places where GPS is unavailable or poor – indoors, near or underneath tall structures – can easily lose their position. Civilian GPS also has a certain amount of positional error – GPS enabled smartphones are typically accurate to within a 4.9m radius under open sky, but worsens near buildings, bridges and trees. A T265 can be added to a drone, which allows the GPS data to be combined with its own positional data giving both greater accuracy than either would have individually over long ranges. By further fusing a D435 depth camera into the system, similarly to the robotics example given earlier, the device can start to understand its environment and more easily avoid collisions with static or moving objects.
The Intel® RealSense™ SDK 2.0 is open source and offers various code examples to get developers up and running with our technology quickly. For an example of depth and tracking integration, this ROS package can be used to generate a 2D occupancy map based on depth images and poses from depth and tracking cameras respectively.
Internally, it uses a 3D representation to transform point clouds (the common output of a depth camera) into a common reference frame, using the pose data from the T265 tracking camera, and to accumulate information over time. The accumulated measurements are mapped with each voxel (or volumetric pixel) having a probability ranging from zero (free) to 100 (occupied). This output can then be used for robotic or drone navigation.
We have created a 3D printable mount available here for download. With the D435 and T265 mounted in this way, the bracket holds the cameras in a known alignment. The extrinsic calibration data for the mounted cameras is available here and should enable you to get a dual configuration of the two devices up and running very quickly.
In all the use cases above, while one or either device could be used individually for each use case, by using them together, their capabilities are magnified exponentially. These cameras have been designed to both work together flawlessly and complement each other in a wide variety of use cases. If this sounds like something you would like to check out, for a limited time we are offering these devices bundled together at a discount.
Subscribe here to get blog and news updates.
In a three-dimensional world, we still spend much of our time creating and consuming two-dimensional content. Most of the screens
A huge variety of package shapes, sizes, weights and colors pass through today’s e-commerce fulfilment or warehouse distribution centers. Using
Let’s talk about how Intel RealSense computer vision products can enhance your solution.
We'll be in touch soon.