python-3.x scikit-image point-clouds ransac

Finding powerlines in LIDAR point clouds with RANSAC

I'm trying to find powerlines in LIDAR points clouds with skimage.measures ransac() function. This is my very first time meddling with these modules in python so bear with me.

So far all I knew how to do reliably was filtering low or 'ground' points from the cloud to reduce the number of points to deal with.

def filter_Z(las, threshold):
    filtered = laspy.create(point_format = las.header.point_format, file_version = las.header.version)
    filtered.points = las.points[las.Z > las.Z.min() + threshold]

    print(f'original size: {len(las.points)}')
    print(f'filtered size: {len(filtered.points)}')

    filtered.write('filtered_points2.las')

    return filtered

The threshold is something I put in by hand since in the las files I worked with are some nasty outliers that prevent me from dynamically calculating it.

The filtered point cloud, or one of them atleast looks like this:

Note the evil red outliers on top, maybe they're birds or something. Along with them are trees and roofs of buildings. If anyone wants to take a look at the .las files, let me know. I can't put a wetransfer link in the body of the question.

A top down view:

I've looked into it as much as I could, and found the skimage.measure module and the ransac function that comes with it. I played around a bit to get a feel for it and currently I'm stumped on how to continue.

def ransac_linefit_sklearn(points):
    model_robust, inliers = ransac(points, LineModelND, min_samples=2, residual_threshold=1000, max_trials=1000)
    return model_robust, inliers

The result is quite predictable (I ran ransac on a 2D view of the cloud just to make it a bit easier on the pc)

Using this doesn't really yield any good results in examples like the one I posted. The vegetation clusters have too many points and the line is fitted through it because it has the highest point density.

I tried DBSCAN() to cluster up the points but it didn't work. I also attempted OPTICS() but as I write it still hasn't finished running.

From what I've read on various articles, the best course of action would be to cluster up the points and perform RANSAC on each individual cluster to find lines, but I'm not really sure on how to do that or what clustering method to use in situations like these.

One thing I'm also curious about doing is just filtering out the big blobs of trees that mess with model fititng.

Solution

Inadequacy of RANSAC

RANSAC works best whenever your data fits a mono-modal distribution around your model. In the case of this point cloud, it works best whenever there is only one line with outliers, but there are at least 5 lines when viewed birds-eye. Check out this older SO post that discusses your problem. Francesco's response suggests an iterative RANSAC based approach.

Octrees and SVD

Colleagues worked on a similar problem in my previous job. I am not fluent in the approach, but I know enough to provide some hints.

Their approach resembled Francesco's suggestion. They partitioned the point-cloud into octrees and calculated the singular value decomposition (SVD) within each partition. The three resulting singular values will correspond to the geometric distribution of the data.

If the first singular value is significantly greater than the other two, then the points are line-like.
If the first and second singular values are significantly greater than the other, then the points are plane-like
If all three values are of similar magnitude, then the data is just a "glob" of points.

They used these rules iteratively to rule out which points were most likely NOT part of the lines.

Literature

If you want to look into published methods, maybe this paper is a good starting point. Power lines are modeled as hyperbolic functions.