excelnumpypercentile

Explaining the different methods for percentile calculation in NumPy


I was trying to create some box plots of data I have. I first was plotting them using Excel, but wanted to move on to Python to be able to personalize them. However, I was surpiresed to see the difference in the positions of the quartiles Q1 and Q3 given by the box plots in these two software packages.

I noticed Excel uses the Exclusive median calculation to display the quartiles as a default. I was wondering if there is something similar to this in the methods for calculating percentiles in NumPy, and if soemone knows, how are these calculated? I have not been able to find good info in each of the methods.

Thank you in advance,

I tried using all of the methods for quantile calculation in NumPy and could not get the values given by Excel Exclusive median method.


Solution

  • Given a sorted array a of length n we have:


    Example for a = [0, 1, 2, 3, 4]:

    Exclusive: np.arange(1, len(a)+1) / (len(a)+1) gives
    array([0.16666667, 0.33333333, 0.5 , 0.66666667, 0.83333333])
    i.e. the 25 % percentile is halfway between the first and second element (= 0.5)

    Inclusive: np.arange(len(a)) / (len(a)-1) gives
    array([0. , 0.25, 0.5 , 0.75, 1. ])
    i.e. the 25 % percentile is the second element (= 1)

    There is no numpy function to get the exclusive percentiles, but you can use

    import statistics
    
    statistics.quantiles(a, method='exclusive')
    # [0.5, 2.0, 3.5]