pythonnumpymatplotlibmedianscatter

Running median of y-values over a range of x


Below is a scatter plot I constructed from two numpy arrays.

Scatter Plot Example enter image description here

What I'd like to add to this plot is a running median of y over a range of x. I've photoshoped in an example:

Modified Scatter Plot enter image description here

Specifically, I need the median for data points in bins of 1 unit along the x axis between two values (this range will vary between many plots, but I can manually adjust it). I appreciate any tips that can point me in the right direction.


Solution

  • I would use np.digitize to do the bin sorting for you. This way you can easily apply any function and set the range you are interested in.

    import numpy as np
    import pylab as plt
    
    N = 2000
    total_bins = 10
    
    # Sample data
    X = np.random.random(size=N)*10
    Y = X**2 + np.random.random(size=N)*X*10
    
    bins = np.linspace(X.min(),X.max(), total_bins)
    delta = bins[1]-bins[0]
    idx  = np.digitize(X,bins)
    running_median = [np.median(Y[idx==k]) for k in range(total_bins)]
    
    plt.scatter(X,Y,color='k',alpha=.2,s=2)
    plt.plot(bins-delta/2,running_median,'r--',lw=4,alpha=.8)
    plt.axis('tight')
    plt.show()
    

    enter image description here

    As an example of the versatility of the method, let's add errorbars given by the standard deviation of each bin:

    running_std    = [Y[idx==k].std() for k in range(total_bins)]
    plt.errorbar(bins-delta/2,running_median,
                  running_std,fmt=None)
    

    enter image description here