machine-learningcomputer-visionocrhandwriting-recognition

Need to segment each number from the image separately


I have created a CNN model using the MNIST dataset. I want to make predictions for the sequence of numbers present in the images. The technique involves segmenting each image and feeding it into the model, but I am facing difficulties in segmenting numbers from the images because there are two different types of images present. I need a robust technique that removes all the noise and shadows present in the images and segments each number separately. I am sharing the images here as well. I am looking for the robust technique and the code.

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

UPDATED QUESTION

Main Goal of segmentation

I am looking for a method that segments or separates each number from the images above. I believe there must be a unique or robust approach that I could use for all kinds of images (given above). I could apply separate methods for binary and color images, but I want to learn a single approach that works for images like the above.


Solution

  • To compensate for uneven illumination, a standard technique is to first estimate illumination and then divide.

    The background is a white sheet of paper, so that's great. I'll estimate illumination with a median blur. The kernel size must be large enough such that no part of the foreground (written text) remains.

    This will also, coincidentally, correct white balance. If the text were colored, it'd still be colored.

    im = cv.imread("KifRNuGy.jpg")
    
    illumination = cv.medianBlur(im, 101)
    
    compensated = im / illumination
    
    # arbitrary 0.8 to keep the bright background within range
    compensated = (0.8 * 255 * np.clip(compensated, 0, 1)).astype(np.uint8)
    

    compensated


    For the segmentation, perform morphological "closing". That will erase the fine lines. Now you can get connected components from that, get their bounding boxes, but take the images from the source, because all that will have distorted the handwritten digits.

    enter image description here

    enter image description here

    Or with a fixed 256x256 region:

    enter image description here

    enter image description here