Calling masked_array
(the class constructor) and the masked_where
function both seem to do exactly the same thing, in terms of being able to construct a numpy masked array given the data and mask values. When would you use one or the other?
>>> import numpy as np
>>> import numpy.ma as MA
>>> vals = np.array([0,1,2,3,4,5])
>>> cond = vals > 3
>>> vals
array([0, 1, 2, 3, 4, 5])
>>> cond
array([False, False, False, False, True, True], dtype=bool)
>>> MA.masked_array(data=vals, mask=cond)
masked_array(data = [0 1 2 3 -- --],
mask = [False False False False True True],
fill_value = 999999)
>>> MA.masked_where(cond, vals)
masked_array(data = [0 1 2 3 -- --],
mask = [False False False False True True],
fill_value = 999999)
The optional argument copy
to masked_where
(its only documented optional argument) is also supported by masked_array
, so I don't see any options that are unique to masked_where
. Although the converse is not true (e.g. masked_where
doesn't support dtype
), I don't understand the purpose of masked_where
as a separate function.
You comment:
If I call them with inconsistently shaped value and masked arrays, I get the same error message in both cases.
I don't think we can help you without more details on what's different.
For example if I try the obvious inconsistency, that of length, I get different error messages:
In [121]: np.ma.masked_array(vals, cond[:-1])
MaskError: Mask and data not compatible: data size is 5, mask size is 4.
In [122]: np.ma.masked_where(cond[:-1], vals)
IndexError: Inconsistent shape between the condition and the input (got (4,) and (5,))
The test for the where
message is obvious from the code that Corralien shows.
The Masked_Array
class definition has this test:
# Make sure the mask and the data have the same shape
if mask.shape != _data.shape:
(nd, nm) = (_data.size, mask.size)
if nm == 1:
mask = np.resize(mask, _data.shape)
elif nm == nd:
mask = np.reshape(mask, _data.shape)
else:
msg = "Mask and data not compatible: data size is %i, " + \
"mask size is %i."
raise MaskError(msg % (nd, nm))
I'd expect the same message only if the shapes made it past the where
test, but were caught by the Class's test. If so that should be obvious in the full error traceback.
Here's an example that fails on the where
, but passes the base.
In [138]: np.ma.masked_where(cond[:,None],vals)
IndexError: Inconsistent shape between the condition and the input (got (5, 1) and (5,))
In [139]: np.ma.masked_array(vals, cond[:,None])
Out[139]:
masked_array(data=[--, 1, --, 3, --],
mask=[ True, False, True, False, True],
fill_value=999999)
The base class can handle cases where the cond
differs in shape
, but matches in size
(total number of elements). It tries to reshape it. A scalar cond
passes both though the exact test differs.
Based on my reading of the code, I can't conceive of a difference that passes the where
, but not the base.
All the Masked Array code is python readable (see the link the other answer). While there is one base class definition, there are a number of constructor or helper functions, as the where
docs makes clear. I won't worry too much about which function(s) to use, especially if you aren't trying to push the boundaries of what's logical.
Masked arrays, while a part of numpy
for a long time, does not get a whole lot of use, at least judging by relative lack of SO questions. I suspect pandas
has largely replaced it when dealing with data that can have missing values (e.g. time series).