pythonnumpymasked-array

Why does np.ma.array take an inverted mask?


I have some array

a = np.array([1, 2, 3])

and some mask

mask = np.ones(a.shape, dtype=bool)

and can do

np.testing.assert_almost_equal(a[mask], a)  # True

However,

np.ma.array(a, mask)

is equivalent to

a[np.logical_not(mask)]

and

np.ma.array(a, np.logical_not(mask))

is equivalent to

a[mask]

This seems counter intuitive to me.

Would love an explanation for this design choice by numpy.


Solution

  • In [6]: a = np.array([1,2,3])                                                            
    In [7]: idx = np.array([1,0,1], bool)                                                    
    In [8]: idx                                                                              
    Out[8]: array([ True, False,  True])
    In [9]: a[idx]                                                                           
    Out[9]: array([1, 3])
    

    Just because you called a boolean array mask, does not mean it behaves as 'mask' in every sense of the word. I intentionally choose a different name. Yes, we do often call such an array mask and talk of 'masking', but what we are really doing is 'selecting'. The a[idx] operations returns the elements of a where the idx is True. It's the same as indexing with the nonzero tuple:

    In [13]: np.nonzero(idx)                                                                 
    Out[13]: (array([0, 2]),)
    

    In np.ma mask is used in the sense of 'mask out', covering over.

    In [10]: mm = np.ma.masked_array(a, mask=idx)                                            
    In [11]: mm                                                                              
    Out[11]: 
    masked_array(data=[--, 2, --],
                 mask=[ True, False,  True],
           fill_value=999999)
    In [12]: mm.compressed()                                                                 
    Out[12]: array([2])
    

    In the display the masked values show up as '--'. As the np.ma docs say, those elements a considered to be invalid, and will be excluded from computations.

    mm.filled returns an array with the 'masked' value replaced by the 'fill':

    In [16]: mm.filled()                                                                     
    Out[16]: array([999999,      2, 999999])
    

    we can do the same thing with idx:

    In [17]: a[idx] = 999999                                                                 
    In [18]: a                                                                               
    Out[18]: array([999999,      2, 999999])