iosmacosimage-processingapple-vision

How to use Apples Machine Learning to Lift subjects from the background?


In iOS 16 you can lift the subject from an image or isolate the subject by removing the background.

You can see it in action here: https://developer.apple.com/wwdc22/101?time=1101

I wonder whether this feature is also available for developers to use in their own apps. One could probably train a machine learning model and use it with the Vision Framework.

Here's an example on how to implement this, however Apple's solution is already good and I don't want to spent time reinventing the wheel when there's a shortcut.


Solution

  • Apple’s CoreML library and DeepLabV3 image segmentation model are the things you are looking for.

    The DeepLabV3 model has been trained to recognize and segment these items:

    VNCoreMLRequest is the API from the CoreML to use. It accepts a callback function that is to be used to get the features of an image, namely the VNCoreMLFeatureValueObservation object.

    The VNCoreMLFeatureValueObservation object gives the segmentation map of the picture, which is what you were looking for. Removing the background is masking one of these segments.

    A complete, nicely put, step by step guide is here.

    From that link the main part is as below:

    
    // use DeepLabV3
    func runVisionRequest() {
            
            guard let model = try? VNCoreMLModel(for: DeepLabV3(configuration: .init()).model)
            else { return }
            
            let request = VNCoreMLRequest(model: model, completionHandler: visionRequestDidComplete)
            request.imageCropAndScaleOption = .scaleFill
            DispatchQueue.global().async {
    
                let handler = VNImageRequestHandler(cgImage: inputImage.cgImage!, options: [:])
                
                do {
                    try handler.perform([request])
                }catch {
                    print(error)
                }
            }
        }
    
    // extract the segmentation map and convert to an image using a third party library
    
    func visionRequestDidComplete(request: VNRequest, error: Error?) {
                DispatchQueue.main.async {
                    if let observations = request.results as? [VNCoreMLFeatureValueObservation],
                        let segmentationmap = observations.first?.featureValue.multiArrayValue {
                        
                        let segmentationMask = segmentationmap.image(min: 0, max: 1)
    
                        self.outputImage = segmentationMask!.resizedImage(for: self.inputImage.size)!
    
                        maskInputImage()
                    }
                }
        }