I'm slowly migrating to polars from pandas and I have found that in some cases the polars syntax is tricky.
I'm seeking help to do a group_by
followed by a describe
using less (or more readable) code.
See this example:
from io import BytesIO
import pandas as pd
import polars as pl
S = b'''group,value\n3,245\n3,28\n3,48\n1,113\n1,288\n1,165\n2,90\n2,21\n2,109'''
pl_df = pl.read_csv(BytesIO(S))
pd_df = pd.read_csv(BytesIO(S))
# Polars' way
pl_df.group_by('group').map_groups(
lambda df: (
df['value']
.describe()
.with_columns(
group=pl.lit(df['group'][0])
)
)
).pivot(index='group', on='statistic')
Something similar in pandas would be:
# pandas'
pd_df.groupby('group').value.describe()
You can write a quick function that returns a mapping of expressions that you can unpack right into your DataFrame.group_by(...).agg
. This avoids any slow-ness of using map_groups
and enables Polars to easily scan the query for any optimizations (provided you are working with a LazyFrame
).
from io import BytesIO
import pandas as pd
import polars as pl
def describe(column, percentiles=[.25, .5, .75]):
return {
'count': column.count(),
'null_count': column.null_count(),
'mean': column.mean(),
'std': column.std(),
'min': column.min(),
**{
f'{pct*100:g}%': column.quantile(pct)
for pct in percentiles
},
'max': column.max(),
}
S = b'''group,value\n3,245\n3,28\n3,48\n1,113\n1,288\n1,165\n2,90\n2,21\n2,109'''
pl_df = pl.read_csv(BytesIO(S))
print(
pl_df.group_by('group').agg(**describe(pl.col('value')))
)
Produces
shape: (3, 10)
┌───────┬───────┬────────────┬────────────┬───┬───────┬───────┬───────┬─────┐
│ group ┆ count ┆ null_count ┆ mean ┆ … ┆ 25% ┆ 50% ┆ 75% ┆ max │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ u32 ┆ u32 ┆ f64 ┆ ┆ f64 ┆ f64 ┆ f64 ┆ i64 │
╞═══════╪═══════╪════════════╪════════════╪═══╪═══════╪═══════╪═══════╪═════╡
│ 1 ┆ 3 ┆ 0 ┆ 188.666667 ┆ … ┆ 165.0 ┆ 165.0 ┆ 288.0 ┆ 288 │
│ 3 ┆ 3 ┆ 0 ┆ 107.0 ┆ … ┆ 48.0 ┆ 48.0 ┆ 245.0 ┆ 245 │
│ 2 ┆ 3 ┆ 0 ┆ 73.333333 ┆ … ┆ 90.0 ┆ 90.0 ┆ 109.0 ┆ 109 │
└───────┴───────┴────────────┴────────────┴───┴───────┴───────┴───────┴─────┘