Fast method to retrieve contour mask from a binary mask in Python

I want to make a realtime application, which involves finding the edges of a binary mask. I need something fast, without GPU if possible, that runs hopefully below 0.0005 secs per image, with size (1000,1000). I will be using the following example of a binary image ,with size (1000,1000).

(Code to replicate:)

import numpy as np
im=np.zeros((1000,1000),dtype=np.uint8)
im[400:600,400:600]=255

Image

The first logical way to do things fast was to use the OpenCV library:

import cv2
timeit.timeit(lambda:cv2.Laplacian(im,cv2.CV_8U),number=100)/100
0.0011617112159729003

which as expected resulted in: laplacian

I found this way very time consuming. After this I tried findContours:

 def usingcontours(im):
    points=np.transpose(cv2.findContours(im,cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)[1][0])
    tmp=np.zeros_like(im)
    tmp[tuple(points)]=255
    return tmp 
timeit.timeit(lambda:usingcontours(im),number=100)/100
0.0009052801132202148

which gave the same result as above. This is better, but still not as good as I would like. I moved on with usage of numpy, to approximate laplacian using gradient, as last resort, although I knew it would be worse:

def usinggradient(im):
    tmp=np.gradient(im)
    return ((tmp[0]+tmp[1])>0).astype(np.uint8)
timeit.timeit(lambda:usinggradient(im),number=100)/100
0.018681130409240722

So, has anyone any further idea on how I can accelerate my algorithm? I emphasize that I want this algorithm to be used for binary images, so I guess there must be a better implementation.

Solution

I picked the fastest one with cv2.findContours to speed that up. In it, we could replace those expensive transpose and converting to tuple parts with simple slicing, like so -

idx = cv2.findContours(im,cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)[1][0]
out = np.zeros_like(im)
out[idx[:,0,0],idx[:,0,1]] = 255

Runtime test -

In [114]: # Inputs
     ...: im=np.zeros((1000,1000),dtype=np.uint8)
     ...: im[400:600,400:600]=255
     ...: idx = cv2.findContours(im,cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)[1][0]
     ...: 

In [115]: def original_app(im, idx):
     ...:     points=np.transpose(idx)
     ...:     tmp=np.zeros_like(im)
     ...:     tmp[tuple(points)]=255
     ...:     return tmp
     ...: 
     ...: def proposed_app(im, idx):
     ...:     out = np.zeros_like(im)
     ...:     out[idx[:,0,0],idx[:,0,1]] = 255
     ...:     return out
     ...: 

In [120]: %timeit original_app(im, idx)
10000 loops, best of 3: 108 µs per loop

In [121]: %timeit proposed_app(im, idx)
10000 loops, best of 3: 101 µs per loop

In [122]: %timeit cv2.findContours(im,cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
1000 loops, best of 3: 1.55 ms per loop

So, there's some marginal improvement there with the proposed method, but that seems negligible compared to the contour-finding itself.

I looked into scikit-image's version and ran a quick test and seems like it's much slower than OpenCV version.