[SOLVED] Drop rows with all zeros in Polars DataFrame

Drop rows with all zeros in Polars DataFrame

I can use drop_nans() function to remove rows with some or all columns set as nan.

Is there an equivalent function for dropping rows with all columns having value 0?

import polars as pl
df = pl.DataFrame({"a":[0, 0, 0, 0, 30], 
                   "b":[0, 0, 0, 0, 40],
                   "c":[0, 0, 0, 0, 50]})

>>> df

     a        b        c
   i64      i64      i64
  ------------------------
     0        0        0
     0        0        0
     0        0        0
     0        0        0
    30       40       50

In this example, I would like to drop the first 4 rows from the dataframe.

Solution

You can remove with pl.all_horizontal, applied to the condition on pl.all:

df.remove(pl.all_horizontal(pl.all() == 0))

Output:

shape: (1, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 30  ┆ 40  ┆ 50  │
└─────┴─────┴─────┘

Also possible via filter on the inverse:

df.filter(~pl.all_horizontal(pl.all() == 0))

# same output

To remove rows with at least one 0, use pl.any_horizontal:

df = pl.DataFrame({"a":[1, 0, 0, 0, 30],
                   "b":[0, 1, 0, 0, 40],
                   "c":[0, 0, 1, 0, 50]})

df.remove(pl.any_horizontal(pl.all() == 0))

# same output