I have the following code:
import polars as pl
df = pl.DataFrame({
'name': ['CHECK', 'CASH', 'BAR', 'SET'],
'category1': ['AM', 'EU', 'EU', 'AS'],
'category2': ['CA', 'FR', 'DE', 'CX'],
'quantity': [100, -20, 10, -70],
'exposure': [11, -3, 2, 8]
})
FLT_COLS = ['quantity', 'exposure']
OTHER_COLS = [c for c in df.columns if c not in FLT_COLS]
df_temp = df.select(pl.col(FLT_COLS)).sum()\
.with_columns(pl.lit('TOTAL').alias(OTHER_COLS[0]))\
.with_columns([pl.lit('').alias(c) for c in OTHER_COLS[1:]])[df.columns]
pl.concat([df, df_temp])
which gives me the desired output
shape: (5, 5)
┌───────┬───────────┬───────────┬──────────┬──────────┐
│ name ┆ category1 ┆ category2 ┆ quantity ┆ exposure │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ i64 ┆ i64 │
╞═══════╪═══════════╪═══════════╪══════════╪══════════╡
│ CHECK ┆ AM ┆ CA ┆ 100 ┆ 11 │
│ CASH ┆ EU ┆ FR ┆ -20 ┆ -3 │
│ BAR ┆ EU ┆ DE ┆ 10 ┆ 2 │
│ SET ┆ AS ┆ CX ┆ -70 ┆ 8 │
│ TOTAL ┆ ┆ ┆ 20 ┆ 18 │
└───────┴───────────┴───────────┴──────────┴──────────┘
That is, add a row that contains the total sum across a specific list of columns FLT_COLS
, labels the other first column TOTAL
and then puts ""
into the remaining non-summed columns.
Is there a nicer way to add this row? I feel my code looks very clumsy. I also don't like that I have to specify [df.columns]
to re-order the columns, as this feels very inefficient.
Not sure if it is answers entirely, but:
The frame-level .sum()
will give null
for all the String columns:
>>> df.sum()
shape: (1, 5)
┌──────┬───────────┬───────────┬──────────┬──────────┐
│ name ┆ category1 ┆ category2 ┆ quantity ┆ exposure │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ i64 ┆ i64 │
╞══════╪═══════════╪═══════════╪══════════╪══════════╡
│ null ┆ null ┆ null ┆ 20 ┆ 18 │
└──────┴───────────┴───────────┴──────────┴──────────┘
If name
is known, you can then fill_null()
and concat()
pl.concat([df, df.sum().fill_null("").with_columns(name=pl.lit("TOTAL"))])
shape: (5, 5)
┌───────┬───────────┬───────────┬──────────┬──────────┐
│ name ┆ category1 ┆ category2 ┆ quantity ┆ exposure │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ i64 ┆ i64 │
╞═══════╪═══════════╪═══════════╪══════════╪══════════╡
│ CHECK ┆ AM ┆ CA ┆ 100 ┆ 11 │
│ CASH ┆ EU ┆ FR ┆ -20 ┆ -3 │
│ BAR ┆ EU ┆ DE ┆ 10 ┆ 2 │
│ SET ┆ AS ┆ CX ┆ -70 ┆ 8 │
│ TOTAL ┆ ┆ ┆ 20 ┆ 18 │
└───────┴───────────┴───────────┴──────────┴──────────┘
Or if like the example, it is always the "first column", you could use pl.first()
/ pl.nth()
to add in the TOTAL
string.
pl.concat([
df,
df.sum().fill_null("").with_columns(
pl.nth(0).replace("", "TOTAL")
)
])