pythonnumpypercentile

Understanding the subtle difference in calculating percentile


When calculating the percentile using numpy, I see some authors use:

Q1, Q3 = np.percentile(X, [25, 75])

which is clear to me. However, I also see others use:

loss = np.percentile(X, 4)

I presume 4 implies dividing the 100 into 4 percentiles but how the loss is calculated here (i.e., in the second case)?


Solution

  • I don't know where you found the second case but it's incorrect (or misinterpreted).

    np.percentile(X, 4) simply calculates the 4th percentile.

    X = np.arange(0, 101)
    
    np.percentile(X, [25, 75])
    # array([25., 75.])
    
    np.percentile(X, 4)
    # 4.0