E.g. if I have
import polars as pl
df = pl.DataFrame({'a': [1,2,3], 'b': [4,5,6]})
how would I find the cumulative sum of each row?
Expected output:
a b
0 1 5
1 2 7
2 3 9
Here's the equivalent in pandas:
>>> import pandas as pd
>>> pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]}).cumsum(axis=1)
a b
0 1 5
1 2 7
2 3 9
but I can't figure out how to do it in polars
cum_sum_horizontal()
generates a struct of cum_sum values.
df.select(pl.cum_sum_horizontal(pl.all()))
shape: (3, 1)
┌───────────┐
│ cum_sum │
│ --- │
│ struct[2] │
╞═══════════╡
│ {1,5} │
│ {2,7} │
│ {3,9} │
└───────────┘
Which you can unnest()
df.select(pl.cum_sum_horizontal(pl.all())).unnest('cum_sum')
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 5 │
│ 2 ┆ 7 │
│ 3 ┆ 9 │
└─────┴─────┘
The code for cum_sum_horizontal
is here. As it stands, it just calls cum_fold()
df.select(pl.cum_fold(pl.lit(0, pl.UInt32), lambda x, y: x + y, pl.all()))
shape: (3, 1)
┌───────────┐
│ cum_fold │
│ --- │
│ struct[2] │
╞═══════════╡
│ {1,5} │
│ {2,7} │
│ {3,9} │
└───────────┘