pythondataframeconditional-statementspython-polarsboolean-logic

Check if all values of Polars DataFrame are True


How can I check if all values of a polars DataFrame, containing only boolean columns, are True?
Example df:

df = pl.DataFrame({"a": [True, True, None],
                   "b": [True, True, True],
    })

The reason for my question is that sometimes I want to check if all values of a df fulfill a condition, like in the following:

df = pl.DataFrame({"a": [1, 2, None],
                   "b": [4, 5, 6],
}).select(pl.all() >= 1)

By the way, I didn't expect that .select(pl.all() >= 1) keeps the null (None) in last row of column "a", maybe that's worth noting.


Solution

  • As of the date of this edit, I found the following code to be the most idiomatic for polars (also in terms of performance):

    df.fold(lambda s1, s2: s1 & s2).all(ignore_nulls=False)
    

    Note that this code can return True, False or None. None (or nothing) is returned when exclusively True values and null values exist.

    Example with the df from the question:

    
    >>>df = pl.DataFrame({"a": [True, True, None],
    ...                "b": [True, True, True],
    ... })
    ... df.fold(lambda s1, s2: s1 & s2).all(ignore_nulls=False)  # Nothing is returned because of the `None` in the df.
    >>> df = pl.DataFrame({"a": [True, True, True],
    ...                    "b": [True, True, True],
    ...     })
    ... df.fold(lambda s1, s2: s1 & s2).all(ignore_nulls=False)  # True is returned.
    True
    

    If no null values exist in df, one could omit ignore_nulls=False.




    And to finish off, let me show you the second-best answer, it is less straightforward and a bit slower:
    df.mean_horizontal(ignore_nulls=False).eq_missing(1).all()
    

    However, the advantage of this one is that it can only return True or False (no None).
    The second-best answer works because the mean of a row with only True values is always 1.