I have this code:
import polars as pl
pl.DataFrame({
'label': ['AA', 'CC', 'BB', 'AA', 'CC'],
'account': ['EU', 'US', 'US', 'EU', 'EU'],
'qty': [1.5, 43.2, None, None, 18.9]})\
.pivot('account', index='label', aggregate_function='sum')
which gives
shape: (3, 3)
┌───────┬──────┬──────┐
│ label ┆ EU ┆ US │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 │
╞═══════╪══════╪══════╡
│ AA ┆ 1.5 ┆ null │
│ CC ┆ 18.9 ┆ 43.2 │
│ BB ┆ null ┆ 0.0 │
└───────┴──────┴──────┘
Now, when there are any null
values in the original data, I want the pivot table to show null
in the respective cell. However, AA-EU shows 1.5 (but should be null), and BB-US shows 0.0 (but should also be null).
I tried using
aggregate_function=lambda col: pl.when(col.has_nulls())\
.then(pl.lit(None, dtype=pl.Float64))\
.otherwise(pl.sum(col))
but it errors out with AttributeError: 'function' object has no attribute '_pyexpr'
.
How can I fix this?
You can use a Polars expression as an aggregate function:
df.pivot('account', index='label', aggregate_function=
pl.when(~pl.element().has_nulls()).then(pl.element().sum()))
shape: (3, 3)
┌───────┬──────┬──────┐
│ label ┆ EU ┆ US │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 │
╞═══════╪══════╪══════╡
│ AA ┆ null ┆ null │
│ CC ┆ 18.9 ┆ 43.2 │
│ BB ┆ null ┆ null │
└───────┴──────┴──────┘