I am trying to do basically the opposite of drop_nulls()
. I want to keep all rows that have at least one null.
I want to do something like (but I don't want to list all other columns):
for (name,) in (
df.filter(
pl.col("a").is_null()
| pl.col("b").is_null()
| pl.col("c").is_null()
)
.select("name")
.unique()
.rows()
):
print(
f"Ignoring `{name}` because it has at least one null",
file=sys.stderr,
)
df = df.drop_nulls()
It sounds like you are looking for pl.Expr.any_horizontal
. The following will keep all rows containing at least one null value (in any of the columns).
df.filter(pl.any_horizontal(pl.all().is_null()))