In Pandas and Numpy, there are vectorized functions like np.isnan
, np.isinf
, and pd.isnull
to check if the elements of an array, series, or dataframe are various kinds of missing/null/invalid.
They do work on scalars. pd.isnull(None)
simply returns True
rather than pd.Series([True])
, which is convenient.
But let's say I want to know if any object is one of these null values; You can't do that with any of these functions! That's because they will happily vectorize over a variety of data structures. Carelessly using them will inevitably lead to the dreaded "The truth value of a Series is ambiguous" error.
What I want is a function like this:
assert not is_scalar_null(3)
assert not is_scalar_null([1,2])
assert not is_scalar_null([None, 1])
assert not is_scalar_null(pd.Series([None, 1]))
assert not is_scalar_null(pd.Series([None, None]))
assert is_scalar_null(None)
assert is_scalar_null(np.nan)
Internally, the Pandas function pandas._lib.missing.checknull
will do the right thing:
import pandas._libs.missing as libmissing
libmissing.checknull(pd.Series([1,2])) # correctly returns False
But it's generally bad practice to use it; according to Python naming convention, _lib
is private. I'm also not sure about the Numpy equivalents.
Is there an "acceptable" but official way to use the same null-checking logic as NumPy and Pandas, but strictly for scalars?
Scalar-valued isinf
and isnan
can be found directly in the math module.
A basic scalar null check can be done easily:
from math import isnan
def is_scalar_null(x):
return x is None or (isinstance(x, float) and isnan(x))
There is probably some un-captured edge case here, but it works well enough in my usage. This is also liable to change as Pandas starts enriching their representation of "null" data in recent versions (>= 0.25).