image-processing image-segmentation google-vision google-mlkit

Poor selfie segmentation with Google ML Kit

I am using Google ML Kit to do selfie segmentation (https://developers.google.com/ml-kit/vision/selfie-segmentation). However, the output am getting is exteremely poor -

Initial image:

Segmented image with overlay: Observe how the woman's hair is marked pink and the gym equipment and surrounds near her legs are marked non-pink. Even her hands are marked pink (meaning its a background).

When this is overlayed on another image, to create a background removal effect, it looks terrible

The segmentation mask returned by the ML Kit has confidence of 1.0 for all the above non-pink areas, meaning its absolutely certain that the areas non-pink are part of the person!!

Am seeing this for several images, not just this one. Infact, the performance (confidence) is pretty poor for an image segmenter.

Question is - is there a way to improve it, maybe by providing a different/better model? If I use something like the PixelLib, the segmentation is way better, albeit the performance of the library is not low latency, hence can't be run on the mobile.

Any pointers/help regardig this would be really appreciated.

Solution

It might be too optimistic to expect a lightweight real-time CPU-based selfie model to provide accurate segmentation results for a pretty complex and in a way tricky scene (pose, black color of the background and outfit).

Official example highlights the fact complex environments will likely to be a problem.

The only "simple" way of processing your scene is to use depth estimation. Just did a quick test with a pretty complex model:

Results are too far from being usable (at least in a fully automated way). There are several other options:

Create a custom more sport-oriented model, trained on a proper dataset
Use a heavier model (modern phones are quite capable)
Use some reliable pose estimation in order to make sure a particular scene is selfie-compatible