pythonnumpyquantileiqr

exclude zeros in Numpy quantile calculation of rows of an array


I have a 2D-array with zero values in each row.

[[5, 3, 2, 0, 0, 1, 6, 9, 11, 1, 4, 1],
 [0, 0, 12, 0, 1, 0, 0, 2, 0, 30, 2, 2],
 [120, 2, 10, 3, 0, 0, 2, 7, 9, 5, 0, 0]]

Is there a way to calculate the 0.75 quantile of each row by excluding the zero values in the calculation ?

For example, in the second row, only 6 non-zero values[12,1,2,30,2,2] should be used in the calculation. I tried using np.quantile() but it will includes all zero values in the calculation. It seems that Numpy don't have masked array np.ma version of quantile() also.


Solution

  • You can replace the zero values with nan and pass the array into np.nanquantile() to compute the quantile of non-nan values

    >>> arr = np.array([[5, 3, 2, 0, 0, 1, 6, 9, 11, 1, 4, 1],
                        [0, 0, 12, 0, 1, 0, 0, 2, 0, 30, 2, 2],
                        [120, 2, 10, 3, 0, 0, 2, 7, 9, 5, 0, 0]], dtype='f')
     
    >>> arr[arr==0] = np.nan
    >>> arr
    [[  5.   3.   2.  nan  nan   1.   6.   9.  11.   1.   4.   1.]
     [ nan  nan  12.  nan   1.  nan  nan   2.  nan  30.   2.   2.]
     [120.   2.  10.   3.  nan  nan   2.   7.   9.   5.  nan  nan]]
    
    >>> arr_quantile75 = np.nanquantile(arr, 0.75, axis=1)  #by row-axis
    >>> arr_quantile75
    [5.75 9.5  9.25]
    

    np.nanquantile() compute the qth quantile of the data along the specified axis, while ignoring nan values[source]