pythonpython-polars

Conditional assignment in polars dataframe


I am wondering if there's a way to handle conditional assignment in polars dataframe without using numpy related.

import polars as pl
import numpy as np

df = pl.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C'],
                   'conference': ['East', 'East', 'East', 'West', 'West', 'East'],
                   'points': [11, 8, 10, 6, 6, 5],
                   'rebounds': [7, 7, 6, 9, 12, 8]})
shape: (6, 4)
┌──────┬────────────┬────────┬──────────┐
│ team ┆ conference ┆ points ┆ rebounds │
│ ---  ┆ ---        ┆ ---    ┆ ---      │
│ str  ┆ str        ┆ i64    ┆ i64      │
╞══════╪════════════╪════════╪══════════╡
│ A    ┆ East       ┆ 11     ┆ 7        │
│ A    ┆ East       ┆ 8      ┆ 7        │
│ A    ┆ East       ┆ 10     ┆ 6        │
│ B    ┆ West       ┆ 6      ┆ 9        │
│ B    ┆ West       ┆ 6      ┆ 12       │
│ C    ┆ East       ┆ 5      ┆ 8        │
└──────┴────────────┴────────┴──────────┘

Using numpy, we could do:

conditions = [
    df['points'].le(6) & df['rebounds'].le(9),
    df['points'].gt(10) & df['rebounds'].gt(6)
]
choicelist = ['Bad','Good']

df.with_columns(rating = np.select(conditions, choicelist, 'Aveg'))
shape: (6, 5)
┌──────┬────────────┬────────┬──────────┬────────┐
│ team ┆ conference ┆ points ┆ rebounds ┆ rating │
│ ---  ┆ ---        ┆ ---    ┆ ---      ┆ ---    │
│ str  ┆ str        ┆ i64    ┆ i64      ┆ str    │
╞══════╪════════════╪════════╪══════════╪════════╡
│ A    ┆ East       ┆ 11     ┆ 7        ┆ Good   │
│ A    ┆ East       ┆ 8      ┆ 7        ┆ Aveg   │
│ A    ┆ East       ┆ 10     ┆ 6        ┆ Aveg   │
│ B    ┆ West       ┆ 6      ┆ 9        ┆ Bad    │
│ B    ┆ West       ┆ 6      ┆ 12       ┆ Aveg   │
│ C    ┆ East       ┆ 5      ┆ 8        ┆ Bad    │
└──────┴────────────┴────────┴──────────┴────────┘

Solution

  • You can chain when -> then -> otherwise expressions.

    df.with_columns(
        pl.when((pl.col("points") <= 6) & (pl.col("rebounds") <= 9))
        .then(pl.lit("Bad"))
        .when((pl.col("points") > 10) & (pl.col("rebounds") > 6))
        .then(pl.lit("Good"))
        .otherwise(pl.lit("Aveg"))
        .alias("rating")
    )
    
    shape: (6, 5)
    ┌──────┬────────────┬────────┬──────────┬────────┐
    │ team ┆ conference ┆ points ┆ rebounds ┆ rating │
    │ ---  ┆ ---        ┆ ---    ┆ ---      ┆ ---    │
    │ str  ┆ str        ┆ i64    ┆ i64      ┆ str    │
    ╞══════╪════════════╪════════╪══════════╪════════╡
    │ A    ┆ East       ┆ 11     ┆ 7        ┆ Good   │
    │ A    ┆ East       ┆ 8      ┆ 7        ┆ Aveg   │
    │ A    ┆ East       ┆ 10     ┆ 6        ┆ Aveg   │
    │ B    ┆ West       ┆ 6      ┆ 9        ┆ Bad    │
    │ B    ┆ West       ┆ 6      ┆ 12       ┆ Aveg   │
    │ C    ┆ East       ┆ 5      ┆ 8        ┆ Bad    │
    └──────┴────────────┴────────┴──────────┴────────┘
    

    when also accepts *args which are implicitly combined using & which may be preferred in this case:

    .when(pl.col("points") <= 6, pl.col("rebounds") <= 9)