Apply Scaler() on each ID on polars dataframe

I have a dataset with multiple columns and an ID column. Each ID can have different magnitudes and varying sizes across these columns. I want to normalize the columns for each ID separately.

import polars as pl
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

df = pl.DataFrame(
{    "ID" : [1,1,2,2,3,3],
    "Values" : [1,2,3,4,5,6]}
)

If i do this, its using the scaler of the entire dataframe, and i would like to use scaler() for each ID.

I tried this:

(
    df
    .with_columns(
        Value_scaled = scaler.fit_transform(df.select(pl.col("Value"))).over("ID"),
    )
)

But : AttributeError: 'numpy.ndarray' object has no attribute 'over'

And i also tried using a group_by()

(
    df
    .group_by(
        pl.col("ID")
    ).agg(
        scaler.fit_transform(pl.col("Value")).alias("Value_scaled")
    )
)

And i get :

TypeError: float() argument must be a string or a real number, not 'Expr'

Solution

Following the definition outlined in the documentation, the functionality of the MinMaxScaler can be implemented easily using polars' native expression API.

def min_max_scaler(x: str | pl.Expr) -> pl.Expr:
    if isinstance(x, str):
        x = pl.col(x)
    return (x - x.min()) / (x.max() - x.min())

Then, it is compatible with polars' window functions, such as pl.Expr.over, to apply the scaling separately for each ID.

df.with_columns(min_max_scaler("Values").over("ID"))

shape: (6, 2)
┌─────┬────────┐
│ ID  ┆ Values │
│ --- ┆ ---    │
│ i64 ┆ f64    │
╞═════╪════════╡
│ 1   ┆ 0.0    │
│ 1   ┆ 1.0    │
│ 2   ┆ 0.0    │
│ 2   ┆ 1.0    │
│ 3   ┆ 0.0    │
│ 3   ┆ 1.0    │
└─────┴────────┘