python computer-vision gradient edge-detection corner-detection

Corner detection algorithm gives very high value for slanted edges?

I have tried implementing a basic version of shi-tomasi corner detection algorithm. The algorithm works fine for corners but I came across a strange issue that the algorithm also gives high values for slanted(titled) edges.

Here's what i did

Took gray scale image
computer dx, and dy of the image by convolving it with sobel_x and sobel_y
Took a 3 size window and moved it across the image to compute the sum of the elements in the window.
computed sum of the window elements from the dy image and sum of window elements from the dx image and saved it in sum_xx and sum_yy.
created a new image (call it result) where that pixel for which the window sum was computed was replaced with min(sum_xx, sum_yy) as shi-tomasi algorithm requires.

I expected it to give maximum value for corners where dx and dy both are high, but i found it giving high values even for titled edges.

Here are the some outputs of the image i received:

Result:

so far so good, corners have high values.

Another Image:

Result:

Here's where the problem lies. edges have high values which is not expected by the algorithm. I can't fathom how can edges have high values for both x and y gradients (sobel being close approximation of gradient).

I would like to ask your help, if you can help me fix this issue for edges. I am open to any suggestions and ideas .

Here's my code (if it helps):

def st(image, w_size):
    v = []
    dy, dx = sy(image), sx(image)

    dy = dy**2
    dx = dx**2
    dxdy = dx*dy

    dx = cv2.GaussianBlur(dx, (3,3), cv2.BORDER_DEFAULT)
    dy = cv2.GaussianBlur(dy, (3,3), cv2.BORDER_DEFAULT)
    dxdy = cv2.GaussianBlur(dxdy, (3,3), cv2.BORDER_DEFAULT)

    ofset = int(w_size/2)
    for y in range(ofset, image.shape[0]-ofset):
        for x in range(ofset, image.shape[1]-ofset):

            s_y = y - ofset
            e_y = y + ofset + 1

            s_x = x - ofset
            e_x = x + ofset + 1

            w_Ixx = dx[s_y: e_y, s_x: e_x]
            w_Iyy = dy[s_y: e_y, s_x: e_x]
            w_Ixy = dxdy[s_y: e_y, s_x: e_x]

            sum_xx = w_Ixx.sum()
            sum_yy = w_Iyy.sum()
            sum_xy = w_Ixy.sum()
            #sum_r = w_r.sum()

            m = np.matrix([[sum_xx, sum_xy],
                         [sum_xy, sum_yy]])

            eg = np.linalg.eigvals(m)

            v.append((min(eg[0], eg[1]), y, x))
    return v

def sy(img):
    t = cv2.Sobel(img,cv2.CV_8U,0,1,ksize=3)
    return t
def sx(img):
    t = cv2.Sobel(img,cv2.CV_8U,1,0,ksize=3)
    return t

Solution

You misunderstood the Shi-Tomasi method. You are computing the two derivatives dx and dy, locally averaging them (the sum is different from the local average by a constant factor which we can ignore), and then taking the lowest value. The Shi-Tomasi equation refers to the Structure Tensor, it uses the lowest of the two eigenvalues of this matrix.

The Structure tensor is a matrix formed by the outer product of the gradient with itself, and then smoothed:

[ smooth(dx*dx)  smooth(dx*dy) ]
[ smooth(dx*dy)  smooth(dy*dy) ]

That is, we take the x-derivative dx and the y-derivative dy, form the three images dx*dx, dy*dy and dx*dy, and smooth these three images. Now for each pixel we have three values that together form a symmetric matrix. This is called the structure tensor.

The eigenvalues of this structure tensor say something about the local edges. If both are small, there are no edges in the neighborhood. If one is large, then there's a single edge direction in the local neighborhood. If both are large, there's something more complex going on, likely a corner. The larger the smoothing window, the larger the local neighborhood we're examining. It is important to select a neighborhood size that matches the size of the structure we're looking at.

The eigenvectors of the structure tensor say something about the orientation of the local structure. If there's an edge (one eigenvalue is large), then the corresponding eigenvector will be the normal to this edge.

Shi-Tomasi uses the smallest of the two eigenvalues. If the smallest of the two eigenvalues is large, then there's something more complex than an edge going on in the local neighborhood.

The Harris corner detector also uses the Structure tensor, but it combines the determinant and trace obtain a similar result with less computational cost. Shi-Tomasi is better but more expensive to compute because the eigenvalue computation requires the computation of square roots. The Harris detector is an approximation to the Shi-Tomasi detector.

Here is a comparison on Shi-Tomasi (top) and Harris (bottom). I've clipped both to half their max value, as the max values happen in the text area and this lets us see the weaker responses to relevant corners better. As you can see, Shi-Tomasi has a more uniform response to all the corners in the image.

For both of these, I've used a Gaussian window with sigma=2 for the local averaging (which, using a cutoff of 3 sigma, leads to a 13x13 averaging window).

Looking at your updated code, I see several issues. I've annotated these with comments here:

def st(image, w_size):
    v = []
    dy, dx = sy(image), sx(image)

    dy = dy**2
    dx = dx**2
    dxdy = dx*dy
    # Here you have dxdy=dx**2 * dy**2, because dx and dy were changed
    # in the lines above.

    dx = cv2.GaussianBlur(dx, (3,3), cv2.BORDER_DEFAULT)
    dy = cv2.GaussianBlur(dy, (3,3), cv2.BORDER_DEFAULT)
    dxdy = cv2.GaussianBlur(dxdy, (3,3), cv2.BORDER_DEFAULT)
    # Gaussian blur size should be indicated with the sigma of the Gaussian,
    # not with the size of the kernel. A 3x3 kernel corresponds, in OpenCV,
    # to a Gaussian with sigma = 0.8, which is way too small. Use sigma=2.

    ofset = int(w_size/2)
    for y in range(ofset, image.shape[0]-ofset):
        for x in range(ofset, image.shape[1]-ofset):

            s_y = y - ofset
            e_y = y + ofset + 1

            s_x = x - ofset
            e_x = x + ofset + 1

            w_Ixx = dx[s_y: e_y, s_x: e_x]
            w_Iyy = dy[s_y: e_y, s_x: e_x]
            w_Ixy = dxdy[s_y: e_y, s_x: e_x]

            sum_xx = w_Ixx.sum()
            sum_yy = w_Iyy.sum()
            sum_xy = w_Ixy.sum()
            # We've already done the local averaging using GaussianBlur,
            # this summing is now no longer necessary.

            m = np.matrix([[sum_xx, sum_xy],
                         [sum_xy, sum_yy]])

            eg = np.linalg.eigvals(m)

            v.append((min(eg[0], eg[1]), y, x))
    return v

def sy(img):
    t = cv2.Sobel(img,cv2.CV_8U,0,1,ksize=3)
    # The output of Sobel has positive and negative values. By writing it
    # into a 8-bit unsigned integer array, you lose all these negative
    # values, they become 0. This is half your edges that you lose!
    return t
def sx(img):
    t = cv2.Sobel(img,cv2.CV_8U,1,0,ksize=3)
    return t

This is how I modified your code:

import cv2
import numpy as np

def st(image):
    dy, dx = sy(image), sx(image)
    dxdx = cv2.GaussianBlur(dx**2, ksize = None, sigmaX=2)
    dydy = cv2.GaussianBlur(dy**2, ksize = None, sigmaX=2)
    dxdy = cv2.GaussianBlur(dx*dy, ksize = None, sigmaX=2)
    for y in range(image.shape[0]):
        for x in range(image.shape[1]):
            m = np.matrix([[dxdx[y,x], dxdy[y,x]],
                           [dxdy[y,x], dydy[y,x]]])
            eg = np.linalg.eigvals(m)
            image[y,x] = min(eg[0], eg[1])  # Write into the input image.
                                            # Better would be to create a new
                                            # array as output. Make sure it is
                                            # a floating-point type!

def sy(img):
    t = cv2.Sobel(img,cv2.CV_32F,0,1,ksize=3)
    return t

def sx(img):
    t = cv2.Sobel(img,cv2.CV_32F,1,0,ksize=3)
    return t

image = cv2.imread('fu4r5.png', 0)
output = image.astype(np.float32)  # I'm writing the result of the detector in here
st(output)
pp.imshow(output); pp.show()