python opencv image-segmentation watershed

How can I improve Watershed segmentation of heterogenous structures in Python?

I'm following a simple approach to segment cells (microscopy images) using the Watershed algorithm in Python. I'm happy with the result 90% of the time, but I have two main problems: (i) the markers/contours are really "spiky" and (2) the algorithm sometimes fails when two cells are to close to each other (i.e they are segmented together). Can you give some tips in how to improve it?

Here's the code I'm using and an output image showing my 2 issues.

# Adjustable parameters for a future function
img_file = NP_file
sigma = 9 # size of gaussian blur kernel; has to be an even number
alpha = 0.2 #scalling factor distance transform
clear_border = False
remove_small_objects = True

# read image and covert to gray scale 
im = cv2.imread(NP_file, 1)
im = enhanceContrast(im)
im_gray = cv2.cvtColor(im.copy(), cv2.COLOR_BGR2GRAY)

# Basic Median Filter
im_blur = cv2.medianBlur(im_gray, ksize = sigma)

# Threshold Image
th, im_seg = cv2.threshold(im_blur, im_blur.mean(), 255, cv2.THRESH_BINARY);

# filling holes in the segmented image
im_filled = binary_fill_holes(im_seg)

# discard cells touching the border
if clear_border == True: 
    im_filled = skimage.segmentation.clear_border(im_filled)

# filter small particles
if remove_small_objects == True: 
    im_filled = sk.morphology.remove_small_objects(im_filled, min_size = 5000)

# apply distance transform
# labels each pixel of the image with the distance to the nearest obstacle pixel.
# In this case, obstacle pixel is a boundary pixel in a binary image.

dist_transform = cv2.distanceTransform(img_as_ubyte(im_filled), cv2.DIST_L2, 3)

# get sure foreground area: region near to center of object
fg_val, sure_fg = cv2.threshold(dist_transform, alpha * dist_transform.max(), 255, 0)

# get sure background area: region much away from the object
sure_bg = cv2.dilate(img_as_ubyte(im_filled), np.ones((3,3),np.uint8), iterations = 6)
    
# The remaining regions (borders) are those which we don’t know if they are img or background
borders = cv2.subtract(sure_bg, np.uint8(sure_fg))

# use Connected Components labelling: 
# scans an image and groups its pixels into components based on pixel connectivity
# label background of the image with 0 and other objects with integers starting from 1.

n_markers, markers1 = cv2.connectedComponents(np.uint8(sure_fg))

# filter small particles again! (bc of segmentation artifacts)
if remove_small_objects == True: 
    markers1 = sk.morphology.remove_small_objects(markers1, min_size = 1000)
    
# Make sure the background is 1 and not 0; 
# and that borders are marked as 0
markers2 = markers1 + 1
markers2[borders == 255] = 0

# implement the watershed algorithm: connects markers with original image
# The label image will be modified and the marker in the border area will change to -1
im_out = im.copy()
markers3 = cv2.watershed(im_out, markers2)

# generate an extra image with color labels only for visuzalization
# color markers in BLUE (pixels = -1 after watershed algorithm)
im_out[markers3 == -1] = [0, 255, 255]

in case you want to try to reproduce my results you can find my .tif file here: https://drive.google.com/file/d/13KfyUVyHodtEOP_yKAnfFCAhgyoY0BQL/view?usp=sharing

Thanks!

Solution

In the past, the best approach for me to apply the watershed algorithm is 'only when needed'. It is computationally intensive and not needed for the majority of cells in your image. This is the code I have used with your image:

# Threshold your image
# This example worked very well with a threshold value of 1
tv, thresh = cv2.threshold(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), 1, 255, cv2.THRESH_BINARY)

# Minimize the holes in the cells to facilitate finding contours
for i in range(5):
    thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, np.ones((3,3)))
    thresh = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, np.ones((3,3)))

# Find contours and keep the ones big enough to be a cell
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = [c for c in contours if cv2.contourArea(c) > 400]
output = np.zeros_like(thresh)
cv2.drawContours(output, contours, -1, 255, -1)
for i, contour in enumerate(contours):
    x, y, w, h = cv2.boundingRect(contour)
    cv2.putText(output, f"{i}", (x, y), cv2.FONT_HERSHEY_PLAIN, 1, 255, 2)

The output of this code is this image: As you can see, only a pair of cells (contour #7) needs splitting using watershed algorithm. Running the watershed algorithm on that cell is very fast (smaller image to work with) and this is the result:

EDIT Some of the cell morphology calculations that can be used to assess whether the watershed algorithm should be run on an object in the image:

# area
area = cv2.contourArea(contour)
# perimeter, with the minimum value = 0.01 to avoid division by zero in other calculations
perimeter = max(0.01, cv2.arcLength(contour, True))
# circularity
circularity = (4 * math.pi * area) / (perimeter ** 2)
# Check if the cell is convex (not smoothly elliptical)
hull = cv2.convexHull(contour)
convexity = cv2.arcLength(hull, True) / perimeter
approx = cv2.approxPolyDP(contour, 0.1 * perimeter, True)
convex = cv2.isContourConvex(approx)

You will need to find the thresholds for each of the measurements in your project. In my project, cells were elliptic, and having a blob with a large area and convex usually means there are 2 or more cells lump together.