pythonmatplotlibstatisticsellipse

How to find point inside an ellipse?


I'm trying to find points inside the ellipse. It is not an 'ordinary' ellipse actually it is based on average and standard deviation which is much easier than calculating eigen values in order to find confidence interval

Function is not written by me here are the sources https://matplotlib.org/devdocs/gallery/statistics/confidence_ellipse.html https://carstenschelp.github.io/2018/09/14/Plot_Confidence_Ellipse_001.html

Here is the code:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
import matplotlib.transforms as transforms

x = np.array([21.5,16.3,13.7,20.0,17.4,10.4,16.9,7.0,13.8,15.2,13.8,8.2,18.0,9.4,13.2,7.2,21.2,30.2,13.5,29.8,18.3,20.2,31.1,21.5,29.8,18.0,13.1,24.1,32.5,15.4,16.1,15.0,25.9,3.0,17.0,23.6,17.6,-11.8,22.2,26.6,17.8,20.6,23.0,28.0,25.3,22.1,22.4,16.3,22.0,12.1])
y = np.array([92.4,98.2,97.6,95.9,96.5,92.1,89.6,89.4,89.2,89.4,90.2,86.7,89.5,89.9,90.2,87.6,104.0,87.3,99.4,85.4,92.8,92.0,87.9,96.2,94.1,95.2,95.6,86.3,87.6,89.5,95.0,97.1,93.0,87.8,98.9,98.2,100.1,45.4,92.1,91.6,94.7,93.9,91.4,91.1,95.7,93.8,96.4,94.1,94.0,89.1])

#function obtained from matplotlib documentation
#https://matplotlib.org/devdocs/gallery/statistics/confidence_ellipse.html

def confidence_ellipse(x, y, ax, n_std=3.0, facecolor='none', **kwargs):
    """
    Create a plot of the covariance confidence ellipse of *x* and *y*.
    Parameters
    ----------
    x, y : array-like, shape (n, )
        Input data.
    ax : matplotlib.axes.Axes
        The axes object to draw the ellipse into.
    n_std : float
        The number of standard deviations to determine the ellipse's radiuses.
    **kwargs
        Forwarded to `~matplotlib.patches.Ellipse`
    Returns
    -------
    matplotlib.patches.Ellipse
    """
    if x.size != y.size:
        raise ValueError("x and y must be the same size")

    cov = np.cov(x, y)
    pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1])
    # Using a special case to obtain the eigenvalues of this
    # two-dimensionl dataset.
    ell_radius_x = np.sqrt(1 + pearson)
    ell_radius_y = np.sqrt(1 - pearson)
    ellipse = Ellipse((0, 0), width=ell_radius_x * 2, height=ell_radius_y * 2,
                      facecolor=facecolor, **kwargs)

    # Calculating the stdandard deviation of x from
    # the squareroot of the variance and multiplying
    # with the given number of standard deviations.
    scale_x = np.sqrt(cov[0, 0]) * n_std
    mean_x = np.mean(x)

    # calculating the stdandard deviation of y ...
    scale_y = np.sqrt(cov[1, 1]) * n_std
    mean_y = np.mean(y)

    transf = transforms.Affine2D() \
        .rotate_deg(45) \
        .scale(scale_x, scale_y) \
        .translate(mean_x, mean_y)

    ellipse.set_transform(transf + ax.transData)
    return ax.add_patch(ellipse)


#implementation
fig, ax = plt.subplots(1, 1, figsize=(8, 4))
ax.scatter(x,y,s=5)
ellipse = confidence_ellipse(x,y,ax,n_std=2,edgecolor='red')
plt.show()

Output visual

Afterwards I tried to find get center coordinates and the points inside the ellipse as below:

ellipse.get_center()
Out:(0,0)
ellipse.contains_point([21.5,92.4])#first points in x,y arrays
Out:False
ellipse.contains_point([0,0])#get_center() result
Out:False

Ellipse plot is working fine but I need every points coordinates inside the ellipse. What I am doing wrong? I already checked similar questions but they didn't work either.


Solution

  • The confidence_ellipse example function only returns an object for drawing, and the contains point will only tell you if the point is on the ellipse.

    What you probably want is something like:

    import math
    class distribution():
        def __init__(self,x,y):
            self.cov  = np.cov(x, y)        
            self.mean = np.matrix( [np.mean(x), np.mean(y)])
        def dist(self, x,y):
            tmp = np.matrix([x,y])
            diff = self.mean - tmp
            dist = diff * np.linalg.inv(self.cov) * diff.T
            return math.sqrt(dist)
        def is_inside(self, x,y,nstd=2.0):
        if (self.dist(x,y) < nstd):
            return True
        else:
            return False
            
    

    Then you can do :

    d = distribution(x,y)    
    d.is_inside(24.1,86.3)
    

    Returns true.

    Then, for all the points:

    points = np.array(list(zip(x, y)))
    
    points_in  = list(filter(lambda p: d.is_inside(p[0],p[1]), points))
    points_out =  list(filter(lambda p: not d.is_inside(p[0],p[1]), points))
    x_in = [ x[0] for x in points_in] 
    y_in = [ x[1] for x in points_in] 
    
    x_out = [ x[0] for x in points_out] 
    y_out = [ x[1] for x in points_out] 
    
    
    fig2, ax2 = plt.subplots(1, 1, figsize=(8, 8))
    ax2.scatter(x_in,y_in,s=5, facecolor="green")
    ax2.scatter(x_out,y_out, s=5, facecolor="red")
    ellipse = confidence_ellipse(x,y,ax2,n_std=2,edgecolor='red') # this presupposes that you still have the confidence_ellipse still defined
    
    plt.show()
    

    And your output should look something like this: enter image description here Where the red points are more than 2 standard deviations away, and the green ones are inside.