pythonopencvimage-processing

How to remove small connected objects using OpenCV


I use OpenCV and Python and I want to remove the small connected object from my image.

I have the following binary image as input:

Input binary image

The image is the result of this code:

dilation = cv2.dilate(dst,kernel,iterations = 2)
erosion = cv2.erode(dilation,kernel,iterations = 3)

I want to remove the objects highlighted in red:

enter image description here

How can I achieve this using OpenCV?


Solution

  • Use connectedComponentsWithStats (doc):

    # Start by finding all of the connected components (white blobs in your image).
    # 'im' needs to be grayscale and 8bit.
    nb_blobs, im_with_separated_blobs, stats, _ = cv2.connectedComponentsWithStats(im)
    # im_with_separated_blobs is an image where each detected blob has a different pixel value ranging from 1 to nb_blobs - 1.
    # The background pixels have value 0.
    

    im_with_separated_blobs looks like this : enter image description here

    # 'stats' (and the silenced output 'centroids') provides information about the blobs. See the docs for more information. 
    # Here, we're interested only in the size of the blobs :
    sizes = stats[:, cv2.CC_STAT_AREA]
    # You can also directly index the column with '-1' instead of 'cv2.CC_STAT_AREA' as it's the last column.
    
    # A small gotcha is that the background is considered as a blob, and so its stats are included in the stats vector at position 0.
    
    # minimum size of particles we want to keep (number of pixels).
    # here, it's a fixed value, but you can set it as you want, eg the mean of the sizes or whatever.
    min_size = 150  
    
    # create empty output image with will contain only the biggest composents
    im_result = np.zeros_like(im_with_separated_blobs)
    
    # for every component in the image, keep it only if it's above min_size.
    # we start at 1 to avoid considering the background
    for index_blob in range(1, nb_blobs):
        if sizes[index_blob] >= min_size:
            im_result[im_with_separated_blobs == index_blob] = 255
    

    im_result looks like this : enter image description here


    Bonus : An alternative way of coding the same process, taking advantage of indexing a numpy array with another array and of the numpy.where function. It's probably more Pythonic insofar that it avoids the for loop, but is also possibly less flexible rearding other filtering conditions (and perhaps less readable) :

    nb_blobs, im_with_separated_blobs, stats, _ = cv2.connectedComponentsWithStats(im)
    
    sizes = stats[:, cv2.CC_STAT_AREA]
    
    im_result = np.where(sizes[im_with_separated_blobs] >= min_size, im, 0)