pythondataframeparallel-processingpython-polars

How to conditionally update a column in a Polars DataFrame with values from a list in Python?


I am trying to update specific rows in a DataFrame where two columns ("Season" and "Wk") meet certain conditions, using values from a list or Series that should align with the filtered rows. In , I would use .loc[] to do this, but I haven't found a way to achieve the same result with Polars.

import polars as pl

# Sample DataFrame
df = pl.DataFrame({
    "Season": [2024, 2024, 2024, 2024],
    "Wk": [28, 28, 29, 29],
    "position": [1, 2, 3, 4]
})

# List of new values for the filtered rows
new_positions = [10, 20]  # Example values aligned with the filtered rows

# Condition to update only where Season is 2024 and Wk is 29
df = df.with_columns(
    # Expected update logic here
)

I've tried approaches with when().then() and zip_with(), but they either don't accept lists/Series directly or don't align the new values correctly with the filtered rows. Is there a recommended way to update a column conditionally in with values from a list or Series in the same order as the filter?


Solution

  • You could generate the indices with when/then and .cum_sum()

    df.with_columns(
       idx = pl.when(Season=2024, Wk=29).then(1).cum_sum() - 1
    )
    
    shape: (4, 4)
    ┌────────┬─────┬──────────┬──────┐
    │ Season ┆ Wk  ┆ position ┆ idx  │
    │ ---    ┆ --- ┆ ---      ┆ ---  │
    │ i64    ┆ i64 ┆ i64      ┆ i32  │
    ╞════════╪═════╪══════════╪══════╡
    │ 2024   ┆ 28  ┆ 1        ┆ null │
    │ 2024   ┆ 28  ┆ 2        ┆ null │
    │ 2024   ┆ 29  ┆ 3        ┆ 0    │
    │ 2024   ┆ 29  ┆ 4        ┆ 1    │
    └────────┴─────┴──────────┴──────┘
    

    Which you can .get() and then .fill_null() the remaining original values.

    df.with_columns(
       pl.lit(pl.Series(new_positions))
         .get(pl.when(Season=2024, Wk=29).then(1).cum_sum() - 1)
         .fill_null(pl.col.position)
         .alias("position")
    )
    
    shape: (4, 3)
    ┌────────┬─────┬──────────┐
    │ Season ┆ Wk  ┆ position │
    │ ---    ┆ --- ┆ ---      │
    │ i64    ┆ i64 ┆ i64      │
    ╞════════╪═════╪══════════╡
    │ 2024   ┆ 28  ┆ 1        │
    │ 2024   ┆ 28  ┆ 2        │
    │ 2024   ┆ 29  ┆ 10       │
    │ 2024   ┆ 29  ┆ 20       │
    └────────┴─────┴──────────┘
    

    Or perhaps .replace_strict()

    df.with_columns(
       (pl.when(Season=2024, Wk=29).then(1).cum_sum() - 1)
          .replace_strict(dict(enumerate(new_positions)), default=pl.col.position)
          .alias("position")
    )