pythonnumpyopencvmediapipe

Mediapipe gives different results in two cases image file path and numpy array input


As you may know, Mediapipe provides landmark locations based on the aligned output image rather than the input image.

Objective: I intend to perform landmark detection on multiple images. Below, I’ve included code that uses PoseLandmarkerOptions to identify 33 body landmarks. After locating these landmarks, I plan to classify the face angle as either 0 degrees, 90 degrees, 180 degrees, or 270 degrees.

Data: I have included sample images from the MARS dataset, as I was unable to use my original images due to issues—They have higher resolution and dimensions compared to the MARS dataset.

1 2 3 4 5 6 7 8 9

all images as a compressed file:

Code: I have provided the main code to detect landmarks in the images.

import sys
import cv2
import numpy as np
import glob
import os
import base64
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
from typing import Dict


base_options = python.BaseOptions(
    model_asset_path="./models/pose_landmarker.task",
    delegate=python.BaseOptions.Delegate.GPU,
)

options = vision.PoseLandmarkerOptions(
    base_options=base_options,
    output_segmentation_masks=True,
    min_pose_detection_confidence=0.5,
    min_pose_presence_confidence=0.5,
    min_tracking_confidence=0.5,
)
detector = vision.PoseLandmarker.create_from_options(options)


def check_landmarks(detection_result, img, address):
    file_name = address.split("/")[-1]
    w, h, _ = img.shape
    for each_person_pose in detection_result.pose_landmarks:
        for each_key_point in each_person_pose:
            if each_key_point.presence > 0.5 and each_key_point.visibility > 0.5:
                x_px = int(each_key_point.x * h)
                y_px = int(each_key_point.y * w)
                cv2.circle(img, (x_px, y_px), 3, (255, 0, 0), 2)
    cv2.imwrite("./landmarks/" + file_name, img)


def rectifier(detector, image, address):
    try:
        srgb_image = mp.Image.create_from_file(address)
        detection_result = detector.detect(srgb_image)
        check_landmarks(detection_result, srgb_image.numpy_view(), address)
    except Exception as e:
        print(f"error {e}")


def rectify_image(rectify_image_request):
    image = cv2.imdecode(
        np.frombuffer(base64.b64decode(rectify_image_request["image"]), np.byte),
        cv2.IMREAD_COLOR,
    )
    rectifier(detector, image, rectify_image_request["address"])


def read_image_for_rectify(address: str) -> Dict:
    face_object = dict()
    img = cv2.imread(address)
    _, buffer = cv2.imencode(".jpg", img)
    img = base64.b64encode(buffer).decode()
    face_object["image"] = img
    face_object["address"] = address
    return face_object


folder_path = "./png2jpg"
file_paths = glob.glob(os.path.join(folder_path, "*.jpg"), recursive=True)
for id_file, file in enumerate(file_paths):
    print(id_file, file)
    rectify_image(read_image_for_rectify(file))

Problem: Initially, I used image addresses to feed images directly to Mediapipe, and the results indicated acceptable performance.

1 2 3 4 5 6 7 8 9

However, I now need to receive images as dictionaries with the images encoded in base64. I have modified the data input accordingly, but upon reviewing the output in this scenario, Mediapipe fails to detect landmarks in many of the images. So I feed images as numpy array into mediapipe by changing this line from

srgb_image = mp.Image.create_from_file(address)

into

srgb_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image)

output in the second scenario:

1 2 3 4 5 6 7 8 9

How can I achieve consistent output in both scenarios?


Solution

  • Thanks to Christoph Rackwitz's suggestion, swapping the image channels in MediaPipe yields the same results as in the first case.

    The rectifier function should be rewritten as follows:

    def rectifier(detector, image, address):
        try:
            # srgb_image = mp.Image.create_from_file(address)
            srgb_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
            detection_result = detector.detect(srgb_image)
            check_landmarks(detection_result, srgb_image.numpy_view(), address)
        except Exception as e:
            print(f"error {e}")
    

    Additionally, channel swapping should also be implemented in the check_landmarks function where the image is written:

    def check_landmarks(detection_result, img, address):
        file_name = address.split("/")[-1]
        w, h, _ = img.shape
        for each_person_pose in detection_result.pose_landmarks:
            for each_key_point in each_person_pose:
                if each_key_point.presence > 0.5 and each_key_point.visibility > 0.5:
                    x_px = int(each_key_point.x * h)
                    y_px = int(each_key_point.y * w)
                    cv2.circle(img, (x_px, y_px), 3, (255, 0, 0), 2)
        cv2.imwrite("/home/nvs/landmarks/" + file_name,    cv2.cvtColor(img, cv2.COLOR_RGB2BGR))
    

    The following parameters have been set for Mediapipe:

    min_pose_detection_confidence=0.5,
    min_pose_presence_confidence=0.5,
    

    However, it has not been able to detect landmarks for some images, such as the one shown below:

    7

    This is acceptable, as it results in a lower false positive rate in this state.