sumpython-polars

Add total row to polars dataframe (to a subset of columns)


I have the following code:

import polars as pl

df = pl.DataFrame({
    'name':      ['CHECK', 'CASH', 'BAR', 'SET'],
    'category1': ['AM', 'EU', 'EU', 'AS'],
    'category2': ['CA', 'FR', 'DE', 'CX'],
    'quantity':  [100, -20, 10, -70],
    'exposure':  [11, -3, 2, 8]
})
FLT_COLS   = ['quantity', 'exposure']
OTHER_COLS = [c for c in df.columns if c not in FLT_COLS]

df_temp = df.select(pl.col(FLT_COLS)).sum()\
            .with_columns(pl.lit('TOTAL').alias(OTHER_COLS[0]))\
            .with_columns([pl.lit('').alias(c) for c in OTHER_COLS[1:]])[df.columns]

pl.concat([df, df_temp])

which gives me the desired output

shape: (5, 5)
┌───────┬───────────┬───────────┬──────────┬──────────┐
│ name  ┆ category1 ┆ category2 ┆ quantity ┆ exposure │
│ ---   ┆ ---       ┆ ---       ┆ ---      ┆ ---      │
│ str   ┆ str       ┆ str       ┆ i64      ┆ i64      │
╞═══════╪═══════════╪═══════════╪══════════╪══════════╡
│ CHECK ┆ AM        ┆ CA        ┆ 100      ┆ 11       │
│ CASH  ┆ EU        ┆ FR        ┆ -20      ┆ -3       │
│ BAR   ┆ EU        ┆ DE        ┆ 10       ┆ 2        │
│ SET   ┆ AS        ┆ CX        ┆ -70      ┆ 8        │
│ TOTAL ┆           ┆           ┆ 20       ┆ 18       │
└───────┴───────────┴───────────┴──────────┴──────────┘

That is, add a row that contains the total sum across a specific list of columns FLT_COLS, labels the other first column TOTAL and then puts "" into the remaining non-summed columns.

Is there a nicer way to add this row? I feel my code looks very clumsy. I also don't like that I have to specify [df.columns] to re-order the columns, as this feels very inefficient.


Solution

  • Not sure if it is answers entirely, but:

    The frame-level .sum() will give null for all the String columns:

    >>> df.sum()
    shape: (1, 5)
    ┌──────┬───────────┬───────────┬──────────┬──────────┐
    │ name ┆ category1 ┆ category2 ┆ quantity ┆ exposure │
    │ ---  ┆ ---       ┆ ---       ┆ ---      ┆ ---      │
    │ str  ┆ str       ┆ str       ┆ i64      ┆ i64      │
    ╞══════╪═══════════╪═══════════╪══════════╪══════════╡
    │ null ┆ null      ┆ null      ┆ 20       ┆ 18       │
    └──────┴───────────┴───────────┴──────────┴──────────┘
    

    If name is known, you can then fill_null() and concat()

    pl.concat([df, df.sum().fill_null("").with_columns(name=pl.lit("TOTAL"))])
    
    shape: (5, 5)
    ┌───────┬───────────┬───────────┬──────────┬──────────┐
    │ name  ┆ category1 ┆ category2 ┆ quantity ┆ exposure │
    │ ---   ┆ ---       ┆ ---       ┆ ---      ┆ ---      │
    │ str   ┆ str       ┆ str       ┆ i64      ┆ i64      │
    ╞═══════╪═══════════╪═══════════╪══════════╪══════════╡
    │ CHECK ┆ AM        ┆ CA        ┆ 100      ┆ 11       │
    │ CASH  ┆ EU        ┆ FR        ┆ -20      ┆ -3       │
    │ BAR   ┆ EU        ┆ DE        ┆ 10       ┆ 2        │
    │ SET   ┆ AS        ┆ CX        ┆ -70      ┆ 8        │
    │ TOTAL ┆           ┆           ┆ 20       ┆ 18       │
    └───────┴───────────┴───────────┴──────────┴──────────┘
    

    Or if like the example, it is always the "first column", you could use pl.first() / pl.nth() to add in the TOTAL string.

    pl.concat([
        df,
        df.sum().fill_null("").with_columns(
            pl.nth(0).replace("", "TOTAL")
        )
    ])