pythondataframepython-polars

Is there a similar function in python polars like 'transform' in pandas?


In pandas, I can use transform to generate the codeindex column:

day = ['day1','day2','day3','day4','day1','day2','day3','day1','day2']

code = ["a","a","a","a","b","b","b","c","c"]

price = [1,2,3,4,5,6,7,8,9]

df = pd.DataFrame({"date":day,"code":code,"price":price})

df['codeindex'] = df.groupby('code')['date'].transform(lambda x: range(0, len(x), 1))

How would I generate the codeindex column using Polars?

Expected output:

┌──────┬──────┬───────┬───────────┐
│ date ┆ code ┆ price ┆ codeindex │
│ ---  ┆ ---  ┆ ---   ┆ ---       │
│ str  ┆ str  ┆ i64   ┆ i64       │
╞══════╪══════╪═══════╪═══════════╡
│ day1 ┆ a    ┆ 1     ┆ 0         │
│ day2 ┆ a    ┆ 2     ┆ 1         │
│ day3 ┆ a    ┆ 3     ┆ 2         │
│ day4 ┆ a    ┆ 4     ┆ 3         │
│ day1 ┆ b    ┆ 5     ┆ 0         │
│ day2 ┆ b    ┆ 6     ┆ 1         │
│ day3 ┆ b    ┆ 7     ┆ 2         │
│ day1 ┆ c    ┆ 8     ┆ 0         │
│ day2 ┆ c    ┆ 9     ┆ 1         │
└──────┴──────┴───────┴───────────┘

Solution

  • You can use window expressions to deal with expressions that only need to be applied within a group.

    A windows expression operates on groups you partition by with .over(). It is defined by an expression part like col("date").cum_count() and a partition part defined by .over("code").

    If you use an aggregation the result will be broadcasted to match the size of the group.

    The code looks like this:

    day = ['day1','day2','day3','day4','day1','day2','day3','day1','day2']
    
    code = ["a","a","a","a","b","b","b","c","c"]
    
    price = [1,2,3,4,5,6,7,8,9]
    
    
    df = pl.DataFrame({"date":day,"code":code,"price":price})
    
    (df.select(
        pl.all(),
        pl.col("date").cum_count().over("code").alias("codeindex"),
    ))
    

    outputs

    shape: (9, 4)
    ┌──────┬──────┬───────┬───────────┐
    │ date ┆ code ┆ price ┆ codeindex │
    │ ---  ┆ ---  ┆ ---   ┆ ---       │
    │ str  ┆ str  ┆ i64   ┆ u32       │
    ╞══════╪══════╪═══════╪═══════════╡
    │ day1 ┆ a    ┆ 1     ┆ 1         │
    │ day2 ┆ a    ┆ 2     ┆ 2         │
    │ day3 ┆ a    ┆ 3     ┆ 3         │
    │ day4 ┆ a    ┆ 4     ┆ 4         │
    │ day1 ┆ b    ┆ 5     ┆ 1         │
    │ day2 ┆ b    ┆ 6     ┆ 2         │
    │ day3 ┆ b    ┆ 7     ┆ 3         │
    │ day1 ┆ c    ┆ 8     ┆ 1         │
    │ day2 ┆ c    ┆ 9     ┆ 2         │
    └──────┴──────┴───────┴───────────┘