I was trying to create some box plots of data I have. I first was plotting them using Excel, but wanted to move on to Python to be able to personalize them. However, I was surpiresed to see the difference in the positions of the quartiles Q1 and Q3 given by the box plots in these two software packages.
I noticed Excel uses the Exclusive median calculation to display the quartiles as a default. I was wondering if there is something similar to this in the methods for calculating percentiles in NumPy, and if soemone knows, how are these calculated? I have not been able to find good info in each of the methods.
Thank you in advance,
I tried using all of the methods for quantile calculation in NumPy and could not get the values given by Excel Exclusive median method.
Given a sorted array a
of length n
we have:
i
th element is i / (n + 1)
i
th element is (i - 1) / (n - 1)
Example for a = [0, 1, 2, 3, 4]
:
Exclusive: np.arange(1, len(a)+1) / (len(a)+1)
gives
array([0.16666667, 0.33333333, 0.5 , 0.66666667, 0.83333333])
i.e. the 25 % percentile is halfway between the first and second element (= 0.5
)
Inclusive: np.arange(len(a)) / (len(a)-1)
gives
array([0. , 0.25, 0.5 , 0.75, 1. ])
i.e. the 25 % percentile is the second element (= 1
)
There is no numpy function to get the exclusive percentiles, but you can use
import statistics
statistics.quantiles(a, method='exclusive')
# [0.5, 2.0, 3.5]