pythonmatplotlibscatter-plot

How to draw a shaded area which tightly includes all the points in scatter plot?


I've pairs of y and z locations which I'm plotting as a scatter plot as shown below.

yz loc scatter

Looking at the plot, we can visualize a tight boundary which includes all of the points. My question is how do we draw this boundary in python? Ideally, I would like to have a filled region representing this area.

I've taken a look at scipy.spatial.ConvexHull, but it fails to capture the lower curve. My attempt:

plt.scatter(yloc, zloc)

points = np.column_stack((yloc, zloc))
hull = ConvexHull(points)

for simplex in hull.simplices:
    plt.plot(points[simplex, 0], points[simplex, 1], 'r-')

ConvexHull attempt

If you want to play with the data, it is available here. Y locations are under the header 'Points:1' and Z locations are under 'Points:2'.


Solution

  • Thanks to the users for pointing me to alphashape. As the alphashape code provided by Keerthan draws piecewise linear boundaries, I wasn't completely satisfied with it. Here's how I managed to generate a smooth curve.

    Continuing from Keerthan's answer

    from scipy.interpolate import splprep, splev
    
    # Instead of ax.add_path, extract the outer points
    
    y = [i[0] for i in list(alpha_shape.exterior.coords)]
    z = [i[1] for i in list(alpha_shape.exterior.coords)]
    
    points = np.array([y, z])
    
    # Create a parametric spline
    # Value of s handles tradeoff between smoothness and accuracy.
    tck, u = splprep(points, s=500, per=True)
    u_new = np.linspace(0, 1, 300)
    y_new, z_new = splev(u_new, tck)
    
    plt.fill(y_new, z_new, linewidth=2, color='tab:red', alpha=0.5)
    

    Result with s=500 (which favours smoothness at a loss of completely capturing all the points) is as follows: Smooth curve