javascriptmediapipeposeimagebitmap

Mediapipe pose SegmentationMask python javascript differences


I am developing a pose recognition webapp using mediapipe pose library (https://google.github.io/mediapipe/solutions/pose.html).

I am using the segmentationMask to find some specific points of the human body that satisfy a constraint (the value in the n-th pixel must be > 0.1).

I'am able to do this evaluation in python. The library returns the segmentation mask as a matrix with the same width and height as the input image, and contains values in [0.0, 1.0] where 1.0 and 0.0 indicate high certainty of a “human” and “background” pixel respectively. So I can iterate over the matrix and I am able to find the point that satisfy the constraint.

I am trying to do the same thing in javascript, but I have a problem. The The javascript version of the library does not return a matrix but returns an ImageBitmap used by the html canvas to draw the mask. The problem is that with ImageBitmap I cannot access every point of the matrix and I am not able to find the points I am interested in.

Is there a way to transform the javascript segmentationMask ImageBitmap in order be similar to the segmenationMask of the python versione library or at least retrive the same informations (I need the values included in this range [0.0, 1.0] for every pixel of the image).

Thank you all.


Solution

  • There is unfortunately no direct way to get an ImageData from an ImageBitmap, but you can drawImage() this ImageBitmap on a clear canvas and then call ctx.getImageData(0, 0, canvas.width, canvas.height) to retrieve an ImageData where you'll get access to all the pixels data.

    The confidence will be stored in the Alpha channel (every fourth item in imageData.data) as a value between 0 and 255.

    function onResults(results) {
      canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
      canvasCtx.drawImage(results.segmentationMask, 0, 0,
                          canvasElement.width, canvasElement.height);
      const imgData = canvasCtx.getImageData(0, 0, canvasElement.width, canvasElement.height);
      let i = 0;
      for (let y = 0; y<imgData.height; y++) {
        for (let x = 0; x<imgData.width; x++) {
          const confidence = imgData.data[i + 3];
          // do something with confidence here
          i++;
        }
      }
    }
    

    And since you're gonna read a lot from that context, don't forget to pass the willReadFrequently option when you get it.

    As a fiddle since StackSnippets won't allow the use of the camera.


    Note that depending on what you do you may want to colorize this image from red to black using globalCompositeOperation and treat the data as an Uint32Array where the confidence would be expressed between 0 and 0xFF000000.