I Detected the ArUco marker and estimated the pose. See the image below. However, Xt (X translation) I get is a positive value. According to the drawAxis
function, the positive direction is away from the image center. So I thought it was supposed to be a negative value. Why I am getting positive instead.
My camera is about 120 mm away from the imaging surface. But I am getting Zt (Z translation) in the range of 650 mm. Is pose estimation giving the pose of marker with respect to physical camera or image plane center? I didn't get why the Zt is so high.
I kept measuring Pose while changing Z
, and obtained roll, pitch, yaw. I noticed roll ( rotation w.r.t. cam X-axis) is changing its sign back and forth magnitude change 166-178, but the sign of Xt did not change with the sign change in roll. Any thoughts on why it behaves like that?
Any suggestion to get more consistent data?
image=cv.imread(fname)
arucoDict = cv.aruco.Dictionary_get(cv.aruco.DICT_4X4_1000)
arucoParams = cv.aruco.DetectorParameters_create()
(corners, ids, rejected) = cv.aruco.detectMarkers(image, arucoDict,
parameters=arucoParams)
print(corners, ids, rejected)
if len(corners) > 0:
# flatten the ArUco IDs list
ids = ids.flatten()
# loop over the detected ArUCo corners
#for (markerCorner, markerID) in zip(corners, ids):
#(markerCorner, markerID)=(corners, ids)
# extract the marker corners (which are always returned in
# top-left, top-right, bottom-right, and bottom-left order)
#corners = corners.reshape((4, 2))
(topLeft, topRight, bottomRight, bottomLeft) = corners[0][0][0],corners[0][0][1],corners[0][0][2],corners[0][0][3]
# convert each of the (x, y)-coordinate pairs to integers
topRight = (int(topRight[0]), int(topRight[1]))
bottomRight = (int(bottomRight[0]), int(bottomRight[1]))
bottomLeft = (int(bottomLeft[0]), int(bottomLeft[1]))
topLeft = (int(topLeft[0]), int(topLeft[1]))
# draw the bounding box of the ArUCo detection
cv.line(image, topLeft, topRight, (0, 255, 0), 2)
cv.line(image, topRight, bottomRight, (0, 255, 0), 2)
cv.line(image, bottomRight, bottomLeft, (0, 255, 0), 2)
cv.line(image, bottomLeft, topLeft, (0, 255, 0), 2)
# compute and draw the center (x, y)-coordinates of the ArUco
# marker
cX = int((topLeft[0] + bottomRight[0]) / 2.0)
cY = int((topLeft[1] + bottomRight[1]) / 2.0)
cv.circle(image, (cX, cY), 4, (0, 0, 255), -1)
if topLeft[1]!=topRight[1] or topLeft[0]!=bottomLeft[0]:
rot1=np.degrees(np.arctan((topLeft[0]-bottomLeft[0])/(bottomLeft[1]-topLeft[1])))
rot2=np.degrees(np.arctan((topRight[1]-topLeft[1])/(topRight[0]-topLeft[0])))
rot=(np.round(rot1,3)+np.round(rot2,3))/2
print(rot1,rot2,rot)
else:
rot=0
# draw the ArUco marker ID on the image
rotS=",rotation:"+str(np.round(rot,3))
cv.putText(image, ("position: "+str(cX) +","+str(cY)),
(100, topLeft[1] - 15), cv.FONT_HERSHEY_SIMPLEX,0.5, (255, 0, 80), 2)
cv.putText(image, rotS,
(400, topLeft[1] -15), cv.FONT_HERSHEY_SIMPLEX,0.5, (255, 0, 80), 2)
print("[INFO] ArUco marker ID: {}".format(ids))
d=np.round((math.dist(topLeft,bottomRight)+math.dist(topRight,bottomLeft))/2,3)
# Get the rotation and translation vectors
rvecs, tvecs, obj_points = cv.aruco.estimatePoseSingleMarkers(corners,aruco_marker_side_length,mtx,dst)
# Print the pose for the ArUco marker
# The pose of the marker is with respect to the camera lens frame.
# Imagine you are looking through the camera viewfinder,
# the camera lens frame's:
# x-axis points to the right
# y-axis points straight down towards your toes
# z-axis points straight ahead away from your eye, out of the camera
#for i, marker_id in enumerate(marker_ids):
# Store the translation (i.e. position) information
transform_translation_x = tvecs[0][0][0]
transform_translation_y = tvecs[0][0][1]
transform_translation_z = tvecs[0][0][2]
# Store the rotation information
rotation_matrix = np.eye(4)
rotation_matrix[0:3, 0:3] = cv.Rodrigues(np.array(rvecs[0]))[0]
r = R.from_matrix(rotation_matrix[0:3, 0:3])
quat = r.as_quat()
# Quaternion format
transform_rotation_x = quat[0]
transform_rotation_y = quat[1]
transform_rotation_z = quat[2]
transform_rotation_w = quat[3]
# Euler angle format in radians
roll_x, pitch_y, yaw_z = euler_from_quaternion(transform_rotation_x,transform_rotation_y,transform_rotation_z,transform_rotation_w)
roll_x = math.degrees(roll_x)
pitch_y = math.degrees(pitch_y)
yaw_z = math.degrees(yaw_z)
Disclaimer: this goes for OpenCV v4.5.5 and corresponding aruco module (contrib repo). They redid a lot of aruco stuff for v4.6.0 and v4.7.0, so best check everything I say here.
Without checking all the code (looks roughly okay), a few basics about OpenCV and aruco:
Both use right-handed coordinate systems. Thumb X, index Y, middle Z.
OpenCV uses X right, Y down, Z far, for screen/camera frames. Origin for screens and pictures is the top left corner. For cameras, the origin is the center of the pinhole model, which would be the center of the aperture. I can't comment on lenses or lens systems. Assume the lens center is the origin. That's probably close enough.
Aruco uses X right, Y far, Z up, if the marker is lying flat on a table. Origin is in the center of the marker. The top left corner of the marker is considered the "first" corner.
The marker can be considered to have its own coordinate system/frame.
The pose given by rvec and tvec is the pose of the marker in the camera frame. That means np.linalg.norm(tvec)
gives you the direct distance from the camera to the marker's center. tvec's Z is just the component parallel to optical axis.
If the marker is in the right half of the picture ("half" defined by camera matrix's cx,cy), you'd expect tvec's X to grow. Lower half, Y positive/growing.
Conversely, that transformation transforms marker-local coordinates to camera-local. Try transforming some marker-local points, such as origin or points on the axes. I believe that cv::transform
can help with that. Using OpenCV's projectPoints
to map 3D space points to 2D image points, you can then draw the marker's axes, or a cube on top of it, or anything you like.
Say the marker sits upright and faces the camera dead-on. When you consider the frame triads of the marker and the camera in space ("world" space), both would be X "right", but one's Y and Z are opposite the other's Y and Z, so you'd expect to see a rotation around the X axis by half a turn (rotating Z and Y).
You could imagine the transformation to happen like this:
If you get implausible values, check aruco_marker_side_length
and camera matrix. f would be around 500-3000 for typical resolutions (VGA-4k) and fields of view (60-80 degrees).