pythondataframedata-sciencepython-polars

Check if any value in a Polars DataFrame is True


This is quite a simple ask but I can't seem to find any clear simplistic solution to this, feels like I'm missing something.

Let's say I have a DataFrame of type

df = pl.from_repr("""
┌───────┬───────┬───────┐
│ a     ┆ b     ┆ c     │
│ ---   ┆ ---   ┆ ---   │
│ bool  ┆ bool  ┆ bool  │
╞═══════╪═══════╪═══════╡
│ false ┆ true  ┆ false │
│ false ┆ false ┆ false │
│ false ┆ false ┆ false │
└───────┴───────┴───────┘
""")

How do I do a simple check if any of the values in the DataFrame is True? Some solutions I have found is

selection = df.select(pl.all().any(ignore_nulls=True))

or

selection = df.filter(pl.any_horizontal())

and then check in that row

any(selection.row(0))

Is just seems like so many steps for a single check


Solution

  • These two options are a bit shorter and stay in pure Polars.

    # Unpivot all the booleans into a single "value" column
    # Pull the "value column out as a Series any do the any
    df.unpivot()["value"].any()
    
    # pl.all().any() checks for any True values per column
    # pl.any_horizontal() checks horizontally per row, reducing to a single value
    df.select(pl.any_horizontal(pl.all().any())).item()
    

    To your question

    This is quite a simple ask but I can't seem to find any clear simplistic solution to this, feels like I'm missing something. It just seems like so many steps for a single check

    You are not missing anything. The reason it feels like a bit more work is because a DataFrame can be thought of more like a (database) table. Generally you have different columns of potentially different types, and you want to different calculations with different columns. So reducing both dimensions into a single value in a single step is just not something typically offered by DataFrame libraries.

    Numpy is much better suited if you have matrices and does offer this in a single step.

    arr = df.to_numpy()
    arr.any() # True