pythondataframepython-polarspolars

Drop rows with all zeros in Polars DataFrame


I can use drop_nans() function to remove rows with some or all columns set as nan.

Is there an equivalent function for dropping rows with all columns having value 0?

import polars as pl
df = pl.DataFrame({"a":[0, 0, 0, 0, 30], 
                   "b":[0, 0, 0, 0, 40],
                   "c":[0, 0, 0, 0, 50]})

>>> df

     a        b        c
   i64      i64      i64
  ------------------------
     0        0        0
     0        0        0
     0        0        0
     0        0        0
    30       40       50

In this example, I would like to drop the first 4 rows from the dataframe.


Solution

  • You can remove with pl.all_horizontal, applied to the condition on pl.all:

    df.remove(pl.all_horizontal(pl.all() == 0))
    

    Output:

    shape: (1, 3)
    ┌─────┬─────┬─────┐
    │ a   ┆ b   ┆ c   │
    │ --- ┆ --- ┆ --- │
    │ i64 ┆ i64 ┆ i64 │
    ╞═════╪═════╪═════╡
    │ 30  ┆ 40  ┆ 50  │
    └─────┴─────┴─────┘
    

    Also possible via filter on the inverse:

    df.filter(~pl.all_horizontal(pl.all() == 0))
    
    # same output
    

    To remove rows with at least one 0, use pl.any_horizontal:

    df = pl.DataFrame({"a":[1, 0, 0, 0, 30],
                       "b":[0, 1, 0, 0, 40],
                       "c":[0, 0, 1, 0, 50]})
    
    df.remove(pl.any_horizontal(pl.all() == 0))
    
    # same output