I am using Google ML Kit to do selfie segmentation (https://developers.google.com/ml-kit/vision/selfie-segmentation). However, the output am getting is exteremely poor -
Initial image:
Segmented image with overlay: Observe how the woman's hair is marked pink and the gym equipment and surrounds near her legs are marked non-pink. Even her hands are marked pink (meaning its a background).
When this is overlayed on another image, to create a background removal effect, it looks terrible
The segmentation mask returned by the ML Kit has confidence of 1.0 for all the above non-pink areas, meaning its absolutely certain that the areas non-pink are part of the person!!
Am seeing this for several images, not just this one. Infact, the performance (confidence) is pretty poor for an image segmenter.
Question is - is there a way to improve it, maybe by providing a different/better model? If I use something like the PixelLib, the segmentation is way better, albeit the performance of the library is not low latency, hence can't be run on the mobile.
Any pointers/help regardig this would be really appreciated.
It might be too optimistic to expect a lightweight real-time CPU-based selfie model to provide accurate segmentation results for a pretty complex and in a way tricky scene (pose, black color of the background and outfit).
Official example highlights the fact complex environments will likely to be a problem.
The only "simple" way of processing your scene is to use depth estimation. Just did a quick test with a pretty complex model:
Results are too far from being usable (at least in a fully automated way). There are several other options: