pythonpandasnumpymissing-data

Scalar-valued isnull()/isnan()/isinf()


In Pandas and Numpy, there are vectorized functions like np.isnan, np.isinf, and pd.isnull to check if the elements of an array, series, or dataframe are various kinds of missing/null/invalid.

They do work on scalars. pd.isnull(None) simply returns True rather than pd.Series([True]), which is convenient.

But let's say I want to know if any object is one of these null values; You can't do that with any of these functions! That's because they will happily vectorize over a variety of data structures. Carelessly using them will inevitably lead to the dreaded "The truth value of a Series is ambiguous" error.

What I want is a function like this:

assert not is_scalar_null(3)
assert not is_scalar_null([1,2])
assert not is_scalar_null([None, 1])
assert not is_scalar_null(pd.Series([None, 1]))
assert not is_scalar_null(pd.Series([None, None]))
assert is_scalar_null(None)
assert is_scalar_null(np.nan)

Internally, the Pandas function pandas._lib.missing.checknull will do the right thing:

import pandas._libs.missing as libmissing
libmissing.checknull(pd.Series([1,2]))  # correctly returns False

But it's generally bad practice to use it; according to Python naming convention, _lib is private. I'm also not sure about the Numpy equivalents.

Is there an "acceptable" but official way to use the same null-checking logic as NumPy and Pandas, but strictly for scalars?


Solution

  • Scalar-valued isinf and isnan can be found directly in the math module.

    A basic scalar null check can be done easily:

    from math import isnan
    
    def is_scalar_null(x):
        return x is None or (isinstance(x, float) and isnan(x))
    

    There is probably some un-captured edge case here, but it works well enough in my usage. This is also liable to change as Pandas starts enriching their representation of "null" data in recent versions (>= 0.25).