pythonopencvimage-processingastronomy

Identifying outliers from openCV contours based on curvature?


I'm processing a photo of the Earth from the Moon, as seen here. Image of the Earth

What I want to do is get the contour points that correspond to the edge of the Earth that is not in shadow. (e.g., the areas that should form an ellipsoid). I want the physical points on the edge of globe itself, and I'm trying to avoid using Gaussian blurring to avoid data destruction.

I have this code that produces contours.

import numpy as np
import cv2 as cv

# Read in Image, convert to grayscale
image_path = r"moon_check.png"
im = cv.imread(image_path)
imgray = cv.cvtColor(im, cv.COLOR_BGR2GRAY)

# Threshold, then find contours
ret, thresh = cv.threshold(imgray, 0, 1, cv.THRESH_BINARY + cv.THRESH_OTSU)
contours, hierarchy = cv.findContours(thresh, cv.RETR_EXTERNAL , cv.CHAIN_APPROX_NONE)
contours = sorted(contours, key=cv.contourArea, reverse=True)
test = im
cnt = contours[0]

The resulting contour is pretty good:

Processed Earth Image with overlaid Contour

... but I want to reduce the Contour elements to a non-contiguous one that excludes the shadows below the terminator, leaving me with only the curved elements near the top.

I've thus far tried the following with poor results:

  1. Manual curvature sampling does work to a certain degree, but I've not found a good 2nd Order derivative check to exclude the regions at the bottom beyond some really manual tuning.

  2. OpenCV minEnclosingCircle doesn't fit very well, possibly due to the oblate spheroid nature of the Earth. Plus, going from the fit to keeping the points closest to that fit is fairly inelegant. I guess I could apply it iteratively and filter by min-distance to the enclosing circle, but that doesn't seem particularly efficient.

  3. I tried the OpenCV Hough's Circle, but it didn't work well using the grayscale image-- probably due to my avoidance of blurring the image etc.

  4. The shadow contours make a classic circle fit method like Kasa's or Coope's not work well.

  5. Convex matching should work for most of it, but many elements on the bottom are also convex.

Any suggestions on methods to explore? Should I be changing how I approach the initial contour creation in the first place?


Solution

  • RANSAC would be overkill

    (but good to know about if you don't know)

    I would suggest RANSAC, as I almost always do when you have a regression whose problem is not accuracy of the data, but numerous outliers. Even more when it is a linear regression (theoretically RANSAC works with any regression. But in practice, there are zillions of libraries with RANSAC applied to linear regression, and very few with more general models).

    But in your case it is even simpler. You don't really need some random choosing of the points (well, that almost what I am about to do. But not completely random), since you know that the correct points are contiguous. So, you just have to choose a subsequence of your sequence of points that yield to the smallest error.

    It is a linear problem

    (Kåsa and Coope's method)

    First of all, finding a circle from points is a linear problem. You know that for each sample point (x,y), we should have
    (x-x₀)² + (y-y₀)² = R²
    That is
    x² + x₀² - 2x.x₀ + y² + y₀² - 2y.y₀ - R² = 0
    that is 2x₀·x + 2y₀·y + (R²-x₀²-y₀²)·1 = x²+y²

    where x₀, y₀ and R are the unknown I am looking for, that is center and radius of the circle.

    So all you have to find are 2 coefficient (α=2x₀ and β=2y₀), and an intercept (γ=R²-x₀²-y₀²), such as αx+βy+γ = x²+y². Which is just a linear regression problem, easily solvable by a Moore-Penrose inverse.

    So, if all your points were ok this code would find the circle

    X=np.array([cnt[:,0,0],cnt[:,0,1],np.ones((len(cnt),))]).T # 3 columns of x, y and 1
    Y=cnt[:,0,0]**2+cnt[:,0,1]**2 # column of x²+y²
    # So we are looking for coefficient such as X@cf = Y. Which is done by Moore-Penrose
    cf = np.linalg.inv(X.T@X)@X.T@Y # Those are our α, β, γ
    # Now we just have to deduce x0, y0, R from that
    x0 = cf[0]/2 # since α=2x₀
    y0 = cf[1]/2 # since β=2y₀
    R = np.sqrt(cf[2]+x0**2+y0**2) # Since γ=R²-x₀²-y₀²
    

    But of course, it doesn't work, since most of the example points are not on the circle (from your image, one may believe that you have some 60% correct points, but in reality, the fractal nature of the shadow is such as it is more something like 15% correct points. 85% are the shadow border)

    And a linear regression find a compromise between BS and correctness.

    Find a subset of point with good fitting

    RANSAC algorithm try to find by trial an error a subset of points such as the number of points that are close to the model is big enough. Then increase the subset with all points close enough. Redo the regression.

    What I do is almost that. Except that I don't need the increase part, nor really the random part. I just bet that there is a subset of W=500 consecutive points that fit well an arc of circle.

    W=500 # Number of consecutive points that may be on the circle
    results=[]
    for i in range(0, len(cnt)-W, 50):
        # Subset
        XX=X[i:i+W]
        YY=Y[i:i+W]
        # Same as before, on subset
        cf=np.linalg.inv(XX.T@XX)@XX.T@YY
        x0=cf[0]/2
        y0=cf[1]/2
        R=np.sqrt(cf[2]+x0**2+y0**2)
        # Error computation
        rmse = ((np.sqrt(((XX[:,:2]-[x0,y0])**2).sum(axis=1))-R)**2).mean() # RMSE
        results.append([x0,y0,R,rmse])
    

    Select the best

    results=np.array(results)
    bestI = np.argmin(results[:,3])
    x0,y0,R=results[bestI, :3]
    

    earth image+contour points+guessed circle

    See how we can't barely see the green point, except in the shadow border (because the circle is almost perfect — and my screenshot lowres :D)

    We could improve that, of course, by doing what would be the next step of that "not-random ransac" (we have have done now, is finding a subset of points that match well an arc circle. And find the circle parameters for that subset, that is a model. Now that we have done so, we could go back to the whole set of points, select all those that are close enough to our model circle. And redo a regression again, to fine tune accuracy).

    But I guess this post is long enough. That you could do that if you want without help, now that I gave you the start. And your contours were any way so perfectly circular, than even when using a small arc of the circle, we have enough to find a really good fit.

    Back to the original question

    All that was to answer to your different trials of HoughCircle, MinCircle, ... But your title question is to remove outlier. Well, that is almost what I did as an intermediary step.

    Removing outliers, is basically what I let aside in the previous paragraph: the points that are too far from this circle are the outliers.

    Side node

    You said

    minEnclosingCircle doesn't fit very well, possibly due to the oblate spheroid nature of the Earth

    There is no oblate spheroid nature of the Earth. Well, there is, of course. That is physics. Rotating planets are not perfect sphere. But that is way less important that people tend to believe, because of exaggerated pictures (a little bit like the misconception about impact of mountains and abyss. In both case we are talking something in the order of 20 kms, compared to a diameter of 12700 kms. That is barely 2 pixels on your image.

    For most coder, the earth is a sphere. People who need to know that it is an oblate spheroid, not a sphere, are usually not coders :D (and if they are, they have way better image that you do :D)

    I tried the OpenCV Hough's Circle, but it didn't work well using the grayscale image-- probably due to my avoidance of blurring the image etc.

    No. It never does. (I am kidding of course. But every time I need to detect some circle in an image — and that has happened a few times, always with some specificity that make impossible to reuse directly the previous time method — I try it. And every time, I need to find something else). If wouldn't blame bluring on here. By its nature, Hough need some binning.