In pandas, I can use transform
to generate the codeindex
column:
day = ['day1','day2','day3','day4','day1','day2','day3','day1','day2']
code = ["a","a","a","a","b","b","b","c","c"]
price = [1,2,3,4,5,6,7,8,9]
df = pd.DataFrame({"date":day,"code":code,"price":price})
df['codeindex'] = df.groupby('code')['date'].transform(lambda x: range(0, len(x), 1))
How would I generate the codeindex
column using Polars?
Expected output:
┌──────┬──────┬───────┬───────────┐
│ date ┆ code ┆ price ┆ codeindex │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 │
╞══════╪══════╪═══════╪═══════════╡
│ day1 ┆ a ┆ 1 ┆ 0 │
│ day2 ┆ a ┆ 2 ┆ 1 │
│ day3 ┆ a ┆ 3 ┆ 2 │
│ day4 ┆ a ┆ 4 ┆ 3 │
│ day1 ┆ b ┆ 5 ┆ 0 │
│ day2 ┆ b ┆ 6 ┆ 1 │
│ day3 ┆ b ┆ 7 ┆ 2 │
│ day1 ┆ c ┆ 8 ┆ 0 │
│ day2 ┆ c ┆ 9 ┆ 1 │
└──────┴──────┴───────┴───────────┘
You can use window expressions
to deal with expressions that only need to be applied within a group.
A windows expression operates on groups you partition by with .over()
. It is defined by an expression part like col("date").cum_count()
and a partition part defined by .over("code")
.
If you use an aggregation
the result will be broadcasted to match the size of the group.
The code looks like this:
day = ['day1','day2','day3','day4','day1','day2','day3','day1','day2']
code = ["a","a","a","a","b","b","b","c","c"]
price = [1,2,3,4,5,6,7,8,9]
df = pl.DataFrame({"date":day,"code":code,"price":price})
(df.select(
pl.all(),
pl.col("date").cum_count().over("code").alias("codeindex"),
))
shape: (9, 4)
┌──────┬──────┬───────┬───────────┐
│ date ┆ code ┆ price ┆ codeindex │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ u32 │
╞══════╪══════╪═══════╪═══════════╡
│ day1 ┆ a ┆ 1 ┆ 1 │
│ day2 ┆ a ┆ 2 ┆ 2 │
│ day3 ┆ a ┆ 3 ┆ 3 │
│ day4 ┆ a ┆ 4 ┆ 4 │
│ day1 ┆ b ┆ 5 ┆ 1 │
│ day2 ┆ b ┆ 6 ┆ 2 │
│ day3 ┆ b ┆ 7 ┆ 3 │
│ day1 ┆ c ┆ 8 ┆ 1 │
│ day2 ┆ c ┆ 9 ┆ 2 │
└──────┴──────┴───────┴───────────┘