python opencv computer-vision stereo-3d disparity-mapping

OpenCV 3D Point Cloud rendering in strange ways

I am a beginner to OpenCV. I am trying to convert a disparity map to a 3D point cloud but, the output of my 3D point cloud looks nothing like a 3D render of the 2D images. I am not sure if this is expected for the techniques that are available with OpenCV. I am hoping to find some help or advice to mitigate this issue.

I have created a disparity map as follows:

import cv2
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
from mpl_toolkits.mplot3d import Axes3D

# Load the left and right images in grayscale
left_image = cv2.imread('Adirondack-perfect/im0.png.', 0)  # Change 'left_image.jpg' to your left image path
right_image = cv2.imread('Adirondack-perfect/im1.png', 0)  # Change 'right_image.jpg' to your right image path

# Initialize the stereo block matching object
stereo = cv2.StereoSGBM_create(
    minDisparity=0,
    numDisparities = 64,  # Ensure it's a multiple of 16
    blockSize=5,  # Decrease for more detail
    P1=8 * 5**2,  # Consider lowering for less smoothness
    P2=32 * 5**2,  # Consider lowering for less smoothness
    disp12MaxDiff=10,  # Non-zero for left-right consistency check
    uniquenessRatio=10,  # Increase for more reliable matches
    speckleWindowSize=100,  # Increase to filter out noise
    speckleRange=32,  # Increase to filter out noise
    preFilterCap=63,
    mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY
)


# Compute the disparity map
disparity_map = stereo.compute(left_image, right_image)
cv2.imwrite('disparity_map.png', disparity_map)

img = disparity_map.copy()
plt.imshow(img, 'CMRmap_r')

The output of this looks great:

Now I am trying to convert this to a 3D point cloud using the following code:

# Intrinsic parameters of the camera
focal_length = 4161.221  # Assuming the focal length is the same in x and y directions
cx = 1445.577  # The x-coordinate of the principal point
cy = 984.686  # The y-coordinate of the principal point
baseline = 176.252  # The distance between the two camera centers

# Creating the Q matrix for reprojecting
Q = np.float32([
    [1, 0, 0, -cx],
    [0, 1, 0, -cy],
    [0, 0, 0, focal_length],
    [0, 0, -1/baseline, 0]
])

# Reproject the points to 3D
points_3D = cv2.reprojectImageTo3D(disparity_map, Q)

# Reshape the points to a 2D array where each row is a point
points = points_3D.reshape(-1, 3)

# Filter out points with a disparity of 0 (indicating no measurement)
mask_map = disparity_map > disparity_map.min()
filtered_points = points[mask_map.ravel()]

# Now, filtered_points contains the 3D coordinates of each pixel

# Visualization (optional)
# You can use matplotlib to create a scatter plot of the 3D points
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(filtered_points[:, 0], filtered_points[:, 1], filtered_points[:, 2], s=1)
plt.show()

However the output looks like this:

Can somebody please help me understand what is going wrong? What can I do to mitigate this crazy scrambled output? Is this expected behavior for such a small data set, given that I am not using any advanced techniques?

Here is where I am getting my dataset from:

https://vision.middlebury.edu/stereo/data/scenes2014/

Thanks

disparity matrix:

https://www.dropbox.com/scl/fi/f68qpimwxp8m487pdm3o6/disparity_map.npy?rlkey=hjm6s9jc2f8w4msm4ao56hq2d&dl=0

Solution

You're trying to calculate disparity, searching a range of 0..63 pixels.

However, those input pictures are huge. The disparity for some matching points in those pictures ranges around 300 for the nearest part of the armrest. It's not just the armrest though, but most of the picture.

The SGBM matcher will try its best, but in those areas it will just not find good matches within 63 pixels, so the resulting values there will be wild, and the plot of the disparity map will look noisy in a particular way, and not have any depth that makes sense. That is what you're seeing.

Your choice: either dial up the range of disparities (and the size of blocks!), or downsample the input images. I'd recommend the latter because it's cheaper. You can use a few applications of cv.pyrDown() (halving) for that, or cv.resize() with INTER_AREA mode and dsize=None, fx=0.25, fy=0.25 arguments.

In this case, 0.25 (two halvings) brought the source data into the range of about 64 pixels of disparity. It's marginal, so you might want to grant the SGBM a few more pixels of disparity range.

Remember that SGBM's disparity values are 4-bit fixed point, i.e. you have to multiply by 1/16 to get values in whole pixels. -1.0 (-16) stands for "no match". That scaling will be important when you try to re-project those image points to 3D.

Here is the result of downsampling with factor 0.25 and then searching for disparities (-1 turned into NaN/black). The map looks reasonable.

You can then scale the map back up to original resolution, if you want. That means resize() and also scaling the disparity values so they match the original resolution.

The black border on the left is "normal" for OpenCV's SGBM. I used to be able to explain why it's there, but can't right now. It might be an accident of implementation, or a bug, or a result of the theory/math. I believe that for those pixels the range to match would go partially outside the other picture, so instead of testing only part of the range, they just don't test anything at all. You can cheat using cv.copyMakeBorder() on the inputs and then cropping the same off the disparity map, but that can easily give you artefacts.