pythonnumpyperformancequantile

Optimize identification of quantiles along array columns


I have an array A (of size m x n), and a percentage p in [0,1]. I need to produce an m x n boolean array B, with True in in the (i,j) entry if A[i,j] is in p^{th} quantile of the column A[:,j].

Here is the code I have used so far.

import numpy as np

m = 200
n = 300

A = np.random.rand(m, n)

p = 0.3

quant_levels = np.zeros(n)
 
for i in range(n):
    quant_levels[i] = np.quantile(A[:,i],p)
    
B = np.array(A >= quant_levels)

Solution

  • I'm not sure it's much faster but you should at least be aware that numpy.quantile has an axis keyword argument so you can compute all the quantiles with one command:

    quant_levels = np.quantile(A, p, axis=0)
    B = (A >= quant_levels)