In the code below I'm creating a polars- and a pandas dataframe with identical data. I want to select a set of rows based on a condition on column A
, then update the corresponding rows for column C
. I've included how I would do this with the pandas dataframe, but I'm coming up short on how to get this working with polars. The closest I've gotten is by using when-then-otherwise
, but I'm unable to use anything other than a single value in then
.
import pandas as pd
import polars as pl
df_pd = pd.DataFrame({'A': ['x', 'x', 'x', 'x', 'y', 'y', 'y', 'y'],
'B': [1, 1, 2, 2, 1, 1, 2, 2],
'C': [1, 2, 3, 4, 5, 6, 7, 8]})
df_pl = pl.DataFrame({'A': ['x', 'x', 'x', 'x', 'y', 'y', 'y', 'y'],
'B': [1, 1, 2, 2, 1, 1, 2, 2],
'C': [1, 2, 3, 4, 5, 6, 7, 8]})
df_pd.loc[df_pd['A'] == 'x', 'C'] = [-1, -2, -3, -4]
df_pl ???
If you wrap the values in a pl.lit
Series, you can index the values with Expr.get
values = pl.lit(pl.Series([-1, -2, -3, -4]))
idxs = pl.when(pl.col.A == 'x').then(1).cum_sum() - 1
df.with_columns(C = pl.coalesce(values.get(idxs), 'C'))
shape: (8, 3)
┌─────┬─────┬─────┐
│ A ┆ B ┆ C │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ x ┆ 1 ┆ -1 │
│ x ┆ 1 ┆ -2 │
│ x ┆ 2 ┆ -3 │
│ x ┆ 2 ┆ -4 │
│ y ┆ 1 ┆ 5 │
│ y ┆ 1 ┆ 6 │
│ y ┆ 2 ┆ 7 │
│ y ┆ 2 ┆ 8 │
└─────┴─────┴─────┘
These are the steps expanded.
The indices are created, used to .get()
and .coalesce()
combines in the values from the other column.
df.with_columns(
idxs = idxs,
values = values.get(idxs),
D = pl.coalesce(values.get(idxs), 'C')
)
shape: (8, 6)
┌─────┬─────┬─────┬──────┬────────┬─────┐
│ A ┆ B ┆ C ┆ idxs ┆ values ┆ D │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i32 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪══════╪════════╪═════╡
│ x ┆ 1 ┆ 1 ┆ 0 ┆ -1 ┆ -1 │
│ x ┆ 1 ┆ 2 ┆ 1 ┆ -2 ┆ -2 │
│ x ┆ 2 ┆ 3 ┆ 2 ┆ -3 ┆ -3 │
│ x ┆ 2 ┆ 4 ┆ 3 ┆ -4 ┆ -4 │
│ y ┆ 1 ┆ 5 ┆ null ┆ null ┆ 5 │
│ y ┆ 1 ┆ 6 ┆ null ┆ null ┆ 6 │
│ y ┆ 2 ┆ 7 ┆ null ┆ null ┆ 7 │
│ y ┆ 2 ┆ 8 ┆ null ┆ null ┆ 8 │
└─────┴─────┴─────┴──────┴────────┴─────┘