This is quite a simple ask but I can't seem to find any clear simplistic solution to this, feels like I'm missing something.
Let's say I have a DataFrame of type
df = pl.from_repr("""
┌───────┬───────┬───────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ bool ┆ bool ┆ bool │
╞═══════╪═══════╪═══════╡
│ false ┆ true ┆ false │
│ false ┆ false ┆ false │
│ false ┆ false ┆ false │
└───────┴───────┴───────┘
""")
How do I do a simple check if any of the values in the DataFrame is True? Some solutions I have found is
selection = df.select(pl.all().any(ignore_nulls=True))
or
selection = df.filter(pl.any_horizontal())
and then check in that row
any(selection.row(0))
Is just seems like so many steps for a single check
These two options are a bit shorter and stay in pure Polars.
# Unpivot all the booleans into a single "value" column
# Pull the "value column out as a Series any do the any
df.unpivot()["value"].any()
# pl.all().any() checks for any True values per column
# pl.any_horizontal() checks horizontally per row, reducing to a single value
df.select(pl.any_horizontal(pl.all().any())).item()
To your question
This is quite a simple ask but I can't seem to find any clear simplistic solution to this, feels like I'm missing something. It just seems like so many steps for a single check
You are not missing anything. The reason it feels like a bit more work is because a DataFrame can be thought of more like a (database) table. Generally you have different columns of potentially different types, and you want to different calculations with different columns. So reducing both dimensions into a single value in a single step is just not something typically offered by DataFrame libraries.
Numpy is much better suited if you have matrices and does offer this in a single step.
arr = df.to_numpy()
arr.any() # True