pythonpandaspython-polars

Take cumsum of each row in polars


E.g. if I have

import polars as pl
df = pl.DataFrame({'a': [1,2,3], 'b': [4,5,6]})

how would I find the cumulative sum of each row?

Expected output:

    a   b
0   1   5
1   2   7
2   3   9

Here's the equivalent in pandas:

>>> import pandas as pd
>>> pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]}).cumsum(axis=1)
    a   b
0   1   5
1   2   7
2   3   9

but I can't figure out how to do it in polars


Solution

  • cum_sum_horizontal() generates a struct of cum_sum values.

    df.select(pl.cum_sum_horizontal(pl.all()))
    
    shape: (3, 1)
    ┌───────────┐
    │ cum_sum   │
    │ ---       │
    │ struct[2] │
    ╞═══════════╡
    │ {1,5}     │
    │ {2,7}     │
    │ {3,9}     │
    └───────────┘
    

    Which you can unnest()

    df.select(pl.cum_sum_horizontal(pl.all())).unnest('cum_sum')
    
    shape: (3, 2)
    ┌─────┬─────┐
    │ a   ┆ b   │
    │ --- ┆ --- │
    │ i64 ┆ i64 │
    ╞═════╪═════╡
    │ 1   ┆ 5   │
    │ 2   ┆ 7   │
    │ 3   ┆ 9   │
    └─────┴─────┘
    

    The code for cum_sum_horizontal is here. As it stands, it just calls cum_fold()

    df.select(pl.cum_fold(pl.lit(0, pl.UInt32), lambda x, y: x + y, pl.all()))
    
    shape: (3, 1)
    ┌───────────┐
    │ cum_fold  │
    │ ---       │
    │ struct[2] │
    ╞═══════════╡
    │ {1,5}     │
    │ {2,7}     │
    │ {3,9}     │
    └───────────┘