pythonrmatlabnumpypercentile

NumPy percentile function different from MATLAB's percentile function


When I try to calculate the 75th percentile in MATLAB, I get a different value than I do in NumPy.

MATLAB:

>> x = [ 11.308 ;   7.2896;   7.548 ;  11.325 ;   5.7822;   9.6343;
     7.7117;   7.3341;  10.398 ;   6.9675;  10.607 ;  13.125 ;
     7.819 ;   8.649 ;   8.3106;  12.129 ;  12.406 ;  10.935 ;
    12.544 ;   8.177 ]

>> prctile(x, 75)

ans =

11.3165

Python + NumPy:

>>> import numpy as np

>>> x = np.array([ 11.308 ,   7.2896,   7.548 ,  11.325 ,   5.7822,   9.6343,
     7.7117,   7.3341,  10.398 ,   6.9675,  10.607 ,  13.125 ,
     7.819 ,   8.649 ,   8.3106,  12.129 ,  12.406 ,  10.935 ,
    12.544 ,   8.177 ])

>>> np.percentile(x, 75)
11.312249999999999

I've checked the answer with R too, and I'm getting NumPy's answer.

R:

> x <- c(11.308 ,   7.2896,   7.548 ,  11.325 ,   5.7822,   9.6343,
+          7.7117,   7.3341,  10.398 ,   6.9675,  10.607 ,  13.125 ,
+          7.819 ,   8.649 ,   8.3106,  12.129 ,  12.406 ,  10.935 ,
+         12.544 ,   8.177)
> quantile(x, 0.75)
     75% 
11.31225 

What is going on here? And is there any way to make Python & R's behavior mirror MATLAB's?


Solution

  • MATLAB apparently uses midpoint interpolation by default. NumPy and R use linear interpolation by default:

    In [182]: np.percentile(x, 75, interpolation='linear')
    Out[182]: 11.312249999999999
    
    In [183]: np.percentile(x, 75, interpolation='midpoint')
    Out[183]: 11.3165
    

    The understand the difference between linear and midpoint, consider this simple example:

    In [187]: np.percentile([0, 100], 75, interpolation='linear')
    Out[187]: 75.0
    
    In [188]: np.percentile([0, 100], 75, interpolation='midpoint')
    Out[188]: 50.0
    

    To compile the latest version of NumPy (using Ubuntu):

    mkdir $HOME/src
    git clone https://github.com/numpy/numpy.git
    git remote add upstream https://github.com/numpy/numpy.git
    # Read ~/src/numpy/INSTALL.txt
    sudo apt-get install libatlas-base-dev libatlas3gf-base
    python setup.py build --fcompiler=gnu95
    python setup.py install
    

    The advantage of using git instead of pip is that it is super easy to upgrade (or downgrade) to other versions of NumPy (and you get the source code too):

    git fetch upstream
    git checkout master # or checkout any other version of NumPy
    cd ~/src/numpy
    /bin/rm -rf build
    cdsitepackages    # assuming you are using virtualenv; otherwise cd to your local python sitepackages directory
    /bin/rm -rf numpy numpy-*-py2.7.egg-info
    cd ~/src/numpy
    python setup.py build --fcompiler=gnu95
    python setup.py install