I have Polars dataframe with a bunch of columns I need to find the column with, for example, the largest sum.
The below snippet sums all of the columns:
df = pl.DataFrame(
{
"a": [0, 1, 3, 4],
"b": [0, 0, 0, 0],
"c": [1, 0, 1, 0],
}
)
max_col = df.select(pl.col(df.columns).sum())
shape: (1, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 8 ┆ 0 ┆ 2 │
└─────┴─────┴─────┘
But I'm missing the last step of selecting the column with the largest value?
I would do this as a unpivot/filter.
(
df
.select(pl.all().sum())
.unpivot()
.filter(pl.col('value')==pl.col('value').max())
)
If you want the original shape then you can tack on transpose
at the end.
(
df
.select(pl.all().sum())
.unpivot()
.filter(pl.col('value')==pl.col('value').max())
.transpose(column_names="variable")
)
The above works if there is a tie, for instance if you have:
df = pl.DataFrame(
{
"a": [0, 1, 3, 4],
"b": [0, 0, 0, 0],
"c": [1, 0, 1, 6],
}
)
then you'll get 'a' and 'c' in either case.