I was earlier able to filter duplicates based on multiple columns using df.filter(pl.col(['A','C']).is_duplicated()) but after the latest version update this is not working.
import polars as pl
df = pl.DataFrame(
{
"A": [1,4,4,7,7,10,10,13,16],
"B": [2,5,5,8,18,11,11,14,17],
"C": [3,6,6,9,9,12,12,15,18]
}
)
Both attempts generate an error:
df.filter(pl.col(['A','C']).is_duplicated())
# ComputeError: The predicate passed to 'LazyFrame.filter' expanded to multiple expression
df.filter(
df.select(pl.col(['A','C']).is_duplicated())
)
# TypeError: invalid predicate for `filter`
This behavior was noted as ambiguous in 0.16.10 and would return this error:
exceptions.ComputeError: The predicate passed to 'LazyFrame.filter' expanded to multiple expressions:
col("A").is_duplicated(),
col("C").is_duplicated(),
This is ambiguous. Try to combine the predicates with the 'all' or `any' expression.
However 0.19.0 removed the deprecated behavior of all/any replaced by all_horizontal and any_horizontal. To get the same behavior as the pre-0.16.10, use df.filter(pl.all_horizontal(pl.col(['A','C']).is_duplicated()))
I've modified the input slightly to reflect the differences between any_horizontal and all_horizontal
import polars as pl
df = pl.DataFrame(
{
"A": [1,3,4,7,7,10,10,13,16],
"B": [2,5,5,8,18,11,11,14,17],
"C": [3,6,6,9,9,12,12,15,18]
}
)
# print("legacy run in 0.16.9: ", df.filter(pl.col(['A','C']).is_duplicated()))
print("all_horizontal: ", df.filter(pl.all_horizontal(pl.col(['A','C']).is_duplicated())))
print("any_horizontal: ", df.filter(pl.any_horizontal(pl.col(['A','C']).is_duplicated())))
legacy run in 0.16.9: shape: (4, 3)
┌─────┬─────┬─────┐
│ A ┆ B ┆ C │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 7 ┆ 8 ┆ 9 │
│ 7 ┆ 18 ┆ 9 │
│ 10 ┆ 11 ┆ 12 │
│ 10 ┆ 11 ┆ 12 │
└─────┴─────┴─────┘
all_horizontal: shape: (4, 3)
┌─────┬─────┬─────┐
│ A ┆ B ┆ C │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 7 ┆ 8 ┆ 9 │
│ 7 ┆ 18 ┆ 9 │
│ 10 ┆ 11 ┆ 12 │
│ 10 ┆ 11 ┆ 12 │
└─────┴─────┴─────┘
any_horizontal: shape: (6, 3)
┌─────┬─────┬─────┐
│ A ┆ B ┆ C │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 3 ┆ 5 ┆ 6 │
│ 4 ┆ 5 ┆ 6 │
│ 7 ┆ 8 ┆ 9 │
│ 7 ┆ 18 ┆ 9 │
│ 10 ┆ 11 ┆ 12 │
│ 10 ┆ 11 ┆ 12 │
└─────┴─────┴─────┘