I am currently studying openCV's 'aruco' module, especially focusing on the poseEstimation of ArUco markers and AprilTags.
Looking into the subpixel accuracy, I have encountered a strange behaviour, which is demonstrated by the code below: If I do provide a 'perfect' calibration (e. g. cx/cy equals image center and distortion is set to zero) and a 'perfect' marker with known edge length, cv.detectMarkers will only yield the correct value, if the rotation is at 0/90/180 or 270 degrees. The subpixel routine yields an (almost) constant value for other orientations, yet at a "shifted" level. It is clear that at the specific angles of 0/90/180/270 degrees the pixel in the corner yields a sharp transition an can thus be detected with high precision. However I struggle to see where the underestimated length in all other cases originates from. Is this a bug or resulting from some trigonometry? --> Look at the graphs generated by the script below: The error in the pose results from the error in the corner detection. Thus the detection accuracy will depend on the orientation of the code.
I also checked ArUco markers and different subpixeling methods. The "peaks" remain, although the angular behaviour in between will change.
I am pretty sure, that this is not due to the interpolation related with the rotation of the marker, since I can observe the same behaviour in real data as well (yet note that the "height" of the peaks seems to depend somehow on the interpolation method. You can test this by changing the flag in cv.warpAffine e. g. to cv.INTER_LINEAR).
My questions would then be:
EDIT: Note that the AprilTag functions have been added to openCV only recently, so you will need to upgrade to the newest version which is not yet available on some standard repositories. You can e. g. get an up-to-date version on conda-forge. /EDIT
# -*- coding: utf-8 -*-
import numpy as np
import cv2 as cv
import pylab as plt
""" generate an "ideal" calibration with zero distortion and perfect alignment
of the main optical axis: """
cam_matrix = np.array([[1.0e+04, 0.00000000e+00, 1.22400000e+03],
[0.00000000e+00, 1.0e+04, 1.02400000e+03],
[0.00000000e+00, 0.00000000e+00, 1.00000000e+00]])
dist_coeffs = np.array([[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.]])
# define detection parameters
marker_length = 30.00 # some arbitrary value
marker_length_px = 700
marker_id = 3
dictionary = cv.aruco.getPredefinedDictionary(cv.aruco.DICT_APRILTAG_16H5)
para = cv.aruco.DetectorParameters_create()
para.cornerRefinementMethod = cv.aruco.CORNER_REFINE_APRILTAG
para.aprilTagDeglitch = 0
para.aprilTagMinWhiteBlackDiff = 30
para.aprilTagMaxLineFitMse = 20
para.aprilTagCriticalRad = 0.1745329201221466 *6
para.aprilTagMinClusterPixels = 5
para.maxErroneousBitsInBorderRate = 0.35
para.errorCorrectionRate = 1.0
para.minMarkerPerimeterRate = 0.05
para.maxMarkerPerimeterRate = 4
para.polygonalApproxAccuracyRate = 0.05
para.minCornerDistanceRate = 0.05
marker_length_list = []
tvec_z_list = []
# generate pictures (AprilTag ID: 3 centered in image will be rotated by fixed angular steps, e. g. 10 degrees)
degrees_list = np.linspace(0,350,36, dtype=np.int).tolist()
marker = cv.aruco.drawMarker(dictionary, marker_id, marker_length_px)
img = np.zeros((2048, 2448), np.uint8)+255
img[674:1374, 874:1574] = marker
cv.imshow("Original", img)
cv.imwrite("original.png", img)
rows, cols = img.shape
for entry in degrees_list:
# rotate original picture
rot_mat = cv.getRotationMatrix2D((((rows-1)/2),(cols-1)/2), entry, 1)
rot_img = cv.warpAffine(img, rot_mat, (cols, rows), flags=cv.INTER_CUBIC) # interpolation changes the "peak amplitude" e.g. try cv.INTER_LINEAR instead
# detect marker and get pose estimate
corners, ids, rejected = cv.aruco.detectMarkers(rot_img,dictionary,parameters=para)
my_index = ids.tolist().index([marker_id])
fCorners = corners[my_index]
fRvec,fTvec, _obj_points = cv.aruco.estimatePoseSingleMarkers(fCorners, marker_length, cam_matrix, dist_coeffs)
# calculate the respective edge length for each side
L1 = abs(np.sqrt(np.square(fCorners[0][0][0]-fCorners[0][1][0])+np.square(fCorners[0][0][1]-fCorners[0][1][1])))
L2 = abs(np.sqrt(np.square(fCorners[0][0][0]-fCorners[0][3][0])+np.square(fCorners[0][0][1]-fCorners[0][3][1])))
L3 = abs(np.sqrt(np.square(fCorners[0][2][0]-fCorners[0][1][0])+np.square(fCorners[0][2][1]-fCorners[0][1][1])))
L4 = abs(np.sqrt(np.square(fCorners[0][2][0]-fCorners[0][3][0])+np.square(fCorners[0][2][1]-fCorners[0][3][1])))
mean_length = (L1+L2+L3+L4)/4
# append results
marker_length_list. append(mean_length)
tvec_z_list.append(fTvec[0][0][2])
plt.figure("TVEC Z")
plt.plot(degrees_list, tvec_z_list, "go--")
plt.xlabel("marker rotation angle (°)")
plt.ylabel("TVEC Z (units of length)")
plt.figure("Mean marker length (should be 700)")
plt.plot(degrees_list, marker_length_list, "bo--")
plt.xlabel("marker rotation angle (°)")
plt.ylabel("marker length (pixels)")
EDIT2: As suggested by Christoph Rackwitz, here is the output generated by the script as pictures:
Answer by Christoph Rackwitz (see comments):
you're getting a mere 2% of a pixel of difference (30 ppm in length, 30 ppm in Z). warpAffine uses fixed point math, 5 fractional bits. also: cubic interpolation. also: corner refinement method (that hopefully works on the edges, not the corners)... and this here is all happening in a "linear colorspace" (the warpaffine). if you gave this real pictures, those are gamma-mapped, and webcams apply sharpening filters, so you'll get even wilder results