python opencv variables computer-vision detection

How can I extract the x and y position of the face detected in mediapipe?

I am trying to use this code to be able to get the x and y coordinates of the face position in real time. I got the code from mediapipe solutions online. When this code is ran, the face is actually detected and all its features are indicated as a red dot on the displayed frame. I want to be able to get the coordinates of the face in integers to use them to track the position with a servo motor later, is there any way I can do that?

text

# face detection

import cv2
import mediapipe as mp
import time

mp_face_detection = mp.solutions.face_detection
mp_drawing = mp.solutions.drawing_utils


# capture video
cap = cv2.VideoCapture(2)
prevTime = 0

with mp_face_detection.FaceDetection( model_selection=1,
    min_detection_confidence=0.65) as face_detection:
  while True:
    success, image = cap.read()
    if not success:
      print("Ignoring empty camera frame.")
      break


    #Convert the BGR image to RGB.
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image.flags.writeable = False
    results = face_detection.process(image)

    # Draw the face detection annotations on the image.
    image.flags.writeable = True
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    if results.detections:
      for detection in results.detections:
        mp_drawing.draw_detection(image, detection)
        print(detection) # I can get the score, x, y,.. 


    cv2.imshow('BlazeFace Face Detection', image)
    if cv2.waitKey(5) & 0xFF == 27:
      break
cap.release()

I tried printing the variable persons in the for loop and I can clearly see that there is the x and y coordinates in it but I failed to extract those specific information. Any idea on how to better manipulate this variable? I will be using the number of faces detected, the coordinated of their position and the confidence level.

Solution

Look at the structure of the result of print(detection):

label_id: 0
score: 0.8402262330055237
location_data {
  format: RELATIVE_BOUNDING_BOX
  relative_bounding_box {
    xmin: 0.4553905725479126
    ymin: 0.6456842422485352
    width: 0.24106884002685547
    height: 0.32147008180618286
  }
  relative_keypoints {
    x: 0.45961669087409973
    y: 0.7614946961402893
  }
[...]
}

These fields are attributes of the output of type mediapipe.framework.formats.detection_pb2.Detection. I will assume that by coordinates of the face you mean its bounding box coordinates.

You can access to these coordinates like this :

relative_bbox = detection.location_data.relative_bounding_box
my_relative_bbox_list = [relative_bbox.xmin,relative_bbox.ymin,relative_bbox.width,relative_bbox.height]