I have this example Polars DataFrame:
import polars as pl
df = pl.DataFrame({
"id": [1, 2, 3, 4, 5],
"variable1": [15, None, 5, 10, 20],
"variable2": [40, 30, 50, 10, None],
})
I'm trying to filter all columns of my dataframe using the method pl.all()
, and I also tried using pl.any_horizontal() == Condition
. However I'm getting the following error:
ComputeError: The predicate passed to 'LazyFrame.filter' expanded to multiple expressions:
col("id").is_not_null(),
col("variable1").is_not_null(),
col("variable2").is_not_null(),
This is ambiguous. Try to combine the predicates with the 'all' or `any' expression.
Here are my attemps to try to face this.
# Attempt 1:
(
df
.filter(
pl.all().is_not_null()
)
)
# Attempt 2:
(
df
.filter(
pl.any_horizontal().is_not_null()
)
)
Desired output, but it's not scalable for bigger DataFrames:
(
df
.filter(
pl.col("variable1").is_not_null(),
pl.col("variable2").is_not_null()
)
)
How can I filter all columns in a scalable way without specifying each column individually?
You need to collapse the multiple-generated-expressions (imagine three matrices come out of that first pl.all()
, one for each column) into a single column. You can do that with pl.all_horizontal(your, columns, here)
:
>>> df.filter(pl.all_horizontal(pl.col('*').is_not_null()))
shape: (3, 3)
┌─────┬───────────┬───────────┐
│ id ┆ variable1 ┆ variable2 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═══════════╪═══════════╡
│ 1 ┆ 15 ┆ 40 │
│ 3 ┆ 5 ┆ 50 │
│ 4 ┆ 10 ┆ 10 │
└─────┴───────────┴───────────┘