By Sergey Dorodnicov, Intel® RealSense™ SDK Manager
In this post, we’ll cover the basics of stereoscopic vision, including block-matching, calibration and rectification, depth from stereo using opencv, passive vs. active stereo, and relation to structured light.
Regular consumer web-cams offer streams of RGB data within the visible spectrum that can be used for object recognition and tracking, as well as basic scene understanding.
Using a depth camera, you can add a brand‑new channel of information, with distance to every pixel. This new channel is used just like the others — for training and image processing, but also for measurement and scene reconstruction.
Depth from Stereo is a classic computer vision algorithm inspired by the human binocular vision system. It relies on two parallel view‑ports and calculates depth by estimating disparities between matching key‑points in the left and right images:
Depth from Stereo algorithm finds disparity by matching blocks in left and right images
Most naive implementation of this idea is the SSD (Sum of Squared Differences) block‑matching algorithm:
import numpy fx = 942.8 # lense focal length baseline = 54.8 # distance in mm between the two cameras disparities = 64 # num of disparities to consider block = 15 # block size to match units = 0.001 # depth units for i in xrange(block, left.shape[0] - block - 1): for j in xrange(block + disparities, left.shape[1] - block - 1): ssd = numpy.empty([disparities, 1]) # calc SSD at all possible disparities l = left[(i - block):(i + block), (j - block):(j + block)] for d in xrange(0, disparities): r = right[(i - block):(i + block), (j - d - block):(j - d + block)] ssd[d] = numpy.sum((l[:,:]-r[:,:])**2) # select the best match disparity[i, j] = numpy.argmin(ssd) # Convert disparity to depth depth = np.zeros(shape=left.shape).astype(float) depth[disparity > 0] = (fx * baseline) / (units * disparity[disparity > 0])
Rectified image pair used as input to the algorithm
Depth map produced by the naive SSD block-matching implementation
Point-cloud reconstructed using SSD block-matching
There are several challenges that any actual product has to overcome:
Having two exactly parallel view‑ports is challenging. While it is possible to generalize the algorithm to any two calibrated cameras (by matching along epipolar lines), the more common approach is image rectification. During this step left and right images are re‑projected to a common virtual plane:
Image Rectification illustrated (Source: Wikipedia*)
OpenCV library has everything you need to get started with depth:
Point-cloud generated using opencv stereobm algorithm
import numpy import cv2 fx = 942.8 # lense focal length baseline = 54.8 # distance in mm between the two cameras disparities = 128 # num of disparities to consider block = 31 # block size to match units = 0.001 # depth units sbm = cv2.StereoBM_create(numDisparities=disparities, blockSize=block) disparity = sbm.compute(left, right) depth = np.zeros(shape=left.shape).astype(float) depth[disparity > 0] = (fx * baseline) / (units * disparity[disparity > 0])
The average running time of stereobm on an Intel(R) Core(TM) i5‑6600K CPU is around 110 ms offering effective 9 frames‑per‑second (FPS).
The quality of the results you’ll get with this algorithm depends primarily on the density of visually distinguishable points (features) for the algorithm to match. Any source of texture — natural or artificial — will significantly improve the accuracy.
That’s why it’s extremely useful to have an optional texture projector that can usually add details outside of the visible spectrum. In addition, you can use this projector as an artificial source of light for nighttime or dark situations.
Input images illuminated with texture projector
Left: opencv stereobm without projector. Right: stereobm with projector.
Structured-Light is an alternative approach to depth from stereo. It relies on recognizing a specific projected pattern in a single image.
Structured‑light solutions do offer certain benefits; however, they are fragile. Any external interference, from the sun or another structured‑light device, will prevent users from achieving any depth.
In addition, because a laser projector must illuminate the entire scene, power consumption goes up with range, which often requires a dedicated power source.
Depth from stereo on the other hand, only benefits from multi-camera setup and can be used with or without projector.
Intel RealSense D400 cameras:
Depth-map using Intel RealSense D415 stereo camera
Point-cloud using Intel RealSense D415 stereo camera
Just like opencv, Intel RealSense technology offers open‑source and cross‑platform set of APIs for getting depth data.
Check out these resources for more info:
Subscribe here to get blog and news updates.
In a three-dimensional world, we still spend much of our time creating and consuming two-dimensional content. Most of the screens
A huge variety of package shapes, sizes, weights and colors pass through today’s e-commerce fulfilment or warehouse distribution centers. Using
Let’s talk about how Intel RealSense computer vision products can enhance your solution.
Please describe:
By submitting this form, you are confirming you are an adult 18 years or older and you agree to share your personal information with Intel to use for this business request. You also agree to subscribe to stay connected to the latest Intel technologies and industry trends by email and telephone. You may unsubscribe at any time. Intel’s web sites and communications are subject to our Privacy Notice and Terms of Use.
We'll be in touch soon.
Keep up to date with Intel® RealSense™ Technology: product updates, upcoming announcements, and events.
By submitting this form, you are confirming you are an adult 18 years or older and you agree to Intel contacting you with marketing-related emails or by telephone. You may unsubscribe at any time. Intel’s web sites and communications are subject to our Privacy Notice and Terms of Use
You were successfully subscribed.
You are about to leave our website and switch to intel.com. Click here to proceed