[SOLVED] exclude zeros in Numpy quantile calculation of rows of an array

exclude zeros in Numpy quantile calculation of rows of an array

I have a 2D-array with zero values in each row.

[[5, 3, 2, 0, 0, 1, 6, 9, 11, 1, 4, 1],
 [0, 0, 12, 0, 1, 0, 0, 2, 0, 30, 2, 2],
 [120, 2, 10, 3, 0, 0, 2, 7, 9, 5, 0, 0]]

Is there a way to calculate the 0.75 quantile of each row by excluding the zero values in the calculation ?

For example, in the second row, only 6 non-zero values[12,1,2,30,2,2] should be used in the calculation. I tried using np.quantile() but it will includes all zero values in the calculation. It seems that Numpy don't have masked array np.ma version of quantile() also.

Solution

You can replace the zero values with nan and pass the array into np.nanquantile() to compute the quantile of non-nan values

>>> arr = np.array([[5, 3, 2, 0, 0, 1, 6, 9, 11, 1, 4, 1],
                    [0, 0, 12, 0, 1, 0, 0, 2, 0, 30, 2, 2],
                    [120, 2, 10, 3, 0, 0, 2, 7, 9, 5, 0, 0]], dtype='f')
 
>>> arr[arr==0] = np.nan
>>> arr
[[  5.   3.   2.  nan  nan   1.   6.   9.  11.   1.   4.   1.]
 [ nan  nan  12.  nan   1.  nan  nan   2.  nan  30.   2.   2.]
 [120.   2.  10.   3.  nan  nan   2.   7.   9.   5.  nan  nan]]

>>> arr_quantile75 = np.nanquantile(arr, 0.75, axis=1)  #by row-axis
>>> arr_quantile75
[5.75 9.5  9.25]

np.nanquantile() compute the qth quantile of the data along the specified axis, while ignoring nan values[source]