It is written in the documentation of Mediapipe that: "x and y: Landmark coordinates normalized to [0.0, 1.0] by the image width and height respectively.", however I'm getting values out of that range.
mediapip
0.10.1
, Python3.8.10
#!/usr/bin/env python3
import numpy as np
import cv2
import mediapipe as mp
import time
class HumanPoseDetection:
def __init__(self):
# TODO: change the path
model_path = "/home/user/models/pose_landmarker_full.task"
BaseOptions = mp.tasks.BaseOptions
self.PoseLandmarker = mp.tasks.vision.PoseLandmarker
PoseLandmarkerOptions = mp.tasks.vision.PoseLandmarkerOptions
self.result = mp.tasks.vision.PoseLandmarkerResult
VisionRunningMode = mp.tasks.vision.RunningMode
self.options = PoseLandmarkerOptions(
base_options=BaseOptions(model_asset_path=model_path),
running_mode=VisionRunningMode.LIVE_STREAM,
result_callback=self.callback
)
def callback(self, result, output_image, timestamp_ms):
if(result.pose_landmarks):
self.result = result.pose_landmarks[0]
for idx, elem in enumerate(self.result):
if(0 <= elem.x <= 1 and 0 <= elem.y <= 1):
pass
else:
print("Warning out of range values: {}".format(elem))
def detect_pose(self):
cap = cv2.VideoCapture(0)
with self.PoseLandmarker.create_from_options(self.options) as landmarker:
while cap.isOpened():
_, image = cap.read()
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (224, 224))
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image)
frame_timestamp_ms = int(time.time() * 1000)
landmarker.detect_async(mp_image, frame_timestamp_ms)
if __name__=="__main__":
HPD_ = HumanPoseDetection()
HPD_.detect_pose()
A workaround proposed here is to use min
, in my case I need the normalized x, y and not the pixel coordinates! also this workaround doesn't seem to be accurate!
x_px = min(math.floor(normalized_x * image_width), image_width - 1)
y_px = min(math.floor(normalized_y * image_height), image_height - 1)
Can you please tell me how can I solve this issue please? thanks in advance.
The coordinates from pose estimation will be outside the range [0,1] if the estimated position of the keypoint is off-screen. e.g., if I put my hand below the webcam field of view, the y coordinate will be greater than 1.
This is because the coordinates are normalized to the image height and width, but the pose estimator still provides estimates for keypoints it can't see.
As the visibility of an off-screen keypoint should be low, you could filter out these keypoints by upping your visibility threshold when you create the pose estimator.
According to the example here (https://github.com/googlesamples/mediapipe/blob/main/examples/pose_landmarker/python/%5BMediaPipe_Python_Tasks%5D_Pose_Landmarker.ipynb), you should be able to modify your code as follows to add min_pose_detection_confidence
:
self.options = PoseLandmarkerOptions(
base_options=BaseOptions(model_asset_path=model_path),
running_mode=VisionRunningMode.LIVE_STREAM,
min_pose_detection_confidence=0.5,
result_callback=self.callback
)
I have used 0.5
, or 50%, as an example. Your results may be better with a different threshold. See min_pose_detection_confidence
from the documentation: https://developers.google.com/mediapipe/solutions/vision/pose_landmarker/python#live-stream
Alternatively, if you don't mind having keypoints detected while they're off-screen, there may be no problem having them returned by the pose estimation. Just treat them as off-screen.