I wish to propagate polygon labels from a source image to a target image. The target image is just the source image, but slightly translated. I found this code snippet that allows me to register a source image to a target image. If you write it as a function, it becomes:
import numpy as np
import cv2
def register_images(
align: np.ndarray,
reference: np.ndarray,
):
"""
Registers two RGB images with each other.
Args:
align: Image to be aligned.
reference: Reference image to be used for alignment.
Returns:
Registered image and transformation matrix.
"""
# Convert to grayscale if needed
_align = align.copy()
_reference = reference.copy()
if _align.shape[-1] == 3:
_align = cv2.cvtColor(_align, cv2.COLOR_RGB2GRAY)
if _reference.shape[-1] == 3:
_reference = cv2.cvtColor(_reference, cv2.COLOR_RGB2GRAY)
height, width = _reference.shape
# Create ORB detector with 5000 features
orb_detector = cv2.ORB_create(500)
# Find the keypoint and descriptors
# The first arg is the image, second arg is the mask (not required in this case).
kp1, d1 = orb_detector.detectAndCompute(_align, None)
kp2, d2 = orb_detector.detectAndCompute(_reference, None)
# Match features between the two images
# We create a Brute Force matcher with Hamming distance as measurement mode.
matcher = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
# Match the two sets of descriptors
matches = list(matcher.match(d1, d2))
# Sort matches on the basis of their Hamming distance and select the top 90 % matches forward
matches.sort(key=lambda x: x.distance)
matches = matches[:int(len(matches) * 0.9)]
no_of_matches = len(matches)
# Define empty matrices of shape no_of_matches * 2
p1 = np.zeros((no_of_matches, 2))
p2 = np.zeros((no_of_matches, 2))
for i in range(len(matches)):
p1[i, :] = kp1[matches[i].queryIdx].pt
p2[i, :] = kp2[matches[i].trainIdx].pt
# Find the homography matrix and use it to transform the colored image wrt the reference
homography, mask = cv2.findHomography(p1, p2, cv2.RANSAC)
transformed_img = cv2.warpPerspective(align, homography, (width, height))
return transformed_img, homography
Now, I can access the transformed image and the homography matrix used for aligning the two images. What I don't understand how to do is how can I apply the same trasformation also to polygon and bounding boxes used to annotate the image.
In patricular, annotations are in COCO format, which means you can access coordinates as follows:
x0, y0, width, height = bounding_box
And annotations are a list of polygon coordinates:
segmentations = [poly1, poly2, poly3, ...] # segmentations are a list of polygons
for poly in segmentations:
x_coords = poly[0::2] # x coordinates are integer values on the even index in the poly list
y_coords = poly[1::2] # y coordinates are integer values on the odd index in the poly list
Once I access the x and y coordinates, how can I apply the homography matrix?
Given any polygon, run it through perspectiveTransform()
along with the homography matrix. That is all.
perspectiveTransform()
takes care of all the linear algebra, including extension of (x,y)
points to (x,y,1)
, the obvious matrix multiplication, dividing by that added w
dimension and dropping the added dimension.
Make sure the polygon is given as a numpy array. If OpenCV decides to play dumb, make sure the array's shape is like (N, 1, 2)
given N points of (x,y) coordinates. dtype
can also make it play dumb. It might want floats, maybe of a specific width.
Whatever types of box you start with, you'll have to calculate the corner points of it. Now it's a polygon. Next: see above.
If you transform an axis-aligned bounding box like that, it will likely no longer appear axis-aligned due to the homography (perspective, shear, rotation, ...). If you need an axis-aligned box around your transformed box or polygon, call boundingRect()
on the set of points.
It's not enough to transform just the top left and bottom right corner of a box. If the transformation is a general rotation (or anything other than translation), and you interpreted the transformed points as the corners of a new axis-aligned box, that box would be broken. It would not align with the part of the image that it used to describe.
first box picture from Creativity103 on Flickr and thumbnail of second box picture from someone on getty
OpenCV homographies are given in forward sense. They work as-is on points with perspectiveTransform()
.
You may, or may not, need to invert the homography. This depends on how you calculated it. np.linalg.inv()
would do that and there is probably some function in OpenCV as well.
The warp...()
functions implicitly invert their given homography because the sampling algorithm needs it like that. When given the WARP_INVERSE_MAP
flag, the passed homography is assumed to be inverted already, hence is not inverted implicitly, but used directly in the "pull"-sense sampling.