I have a Polars DataFrame, and I want to highlight the top 3 values for each column using the style
and loc features in Polars. I can achieve this for individual columns, but my current approach involves a lot of repetition, which is not scalable to many variables.
import polars as pl
import polars.selectors as cs
from great_tables import loc, style
df = pl.DataFrame({
"id": [1, 2, 3, 4, 5],
"variable1": [15, 25, 5, 10, 20],
"variable2": [40, 30, 50, 10, 20],
"variable3": [400, 100, 300, 200, 500]
})
top3_var1 = pl.col("variable1").is_in(pl.col("variable1").top_k(3))
top3_var2 = pl.col("variable2").is_in(pl.col("variable2").top_k(3))
(
df
.style
.tab_style(
style.text(weight="bold"),
loc.body("variable1", top3_var1)
)
.tab_style(
style.text(weight="bold"),
loc.body("variable2", top3_var2)
)
)
This works, but it's not scalable for many columns since I have to manually define top3_var
for each column.
I’ve tried using pl.all().top_k(3)
to make the process more automatic:
(
df
.style
.tab_style(
style.text(weight="bold", ),
loc.body("variable1", top3_var1)
)
.tab_style(
style.text(weight="bold"),
loc.body("variable2", top3_var2)
)
)
However, I’m not sure how to apply the style and loc methods to highlight only the top 3 values in each column individually without affecting the entire row.
As outlined in the comments, there are already some discussions on GitHub regarding adding a loc.body(mask=...)
argument suitable for the use-case.
Until this feature is implemented, you could create a GT
(Great Table) object and iteratively use gt.tab_style
as follows. This avoids the manual chaining of tab_style
calls.
import polars as pl
import polars.selectors as cs
from great_tables import GT, loc, style
df = pl.DataFrame({
"id": [1, 2, 3, 4, 5],
"variable1": [15, 25, 5, 10, 20],
"variable2": [40, 30, 50, 10, 20],
"variable3": [400, 100, 300, 200, 500]
})
gt = GT(df)
for col in df.select(cs.exclude("id")).columns:
gt = gt.tab_style(
style.text(weight="bold"),
loc.body(col, pl.col(col).is_in(pl.col(col).top_k(3)))
)
gt