pythonpython-polarsgreat-tables

How to highlight values per column in Polars


I have a Polars DataFrame, and I want to highlight the top 3 values for each column using the style and loc features in Polars. I can achieve this for individual columns, but my current approach involves a lot of repetition, which is not scalable to many variables.

import polars as pl
import polars.selectors as cs
from great_tables import loc, style 

df = pl.DataFrame({
    "id": [1, 2, 3, 4, 5],
    "variable1": [15, 25, 5, 10, 20],
    "variable2": [40, 30, 50, 10, 20],
    "variable3": [400, 100, 300, 200, 500]
})

top3_var1 = pl.col("variable1").is_in(pl.col("variable1").top_k(3))
top3_var2 = pl.col("variable2").is_in(pl.col("variable2").top_k(3))

(
    df
    .style
    .tab_style(
        style.text(weight="bold"),  
        loc.body("variable1", top3_var1)
    )
    .tab_style(
        style.text(weight="bold"),
        loc.body("variable2", top3_var2)
    )
)

This works, but it's not scalable for many columns since I have to manually define top3_var for each column.

I’ve tried using pl.all().top_k(3) to make the process more automatic:

(
    df
    .style
    .tab_style(
        style.text(weight="bold",   ),  
        loc.body("variable1", top3_var1)
    )
    .tab_style(
        style.text(weight="bold"),
        loc.body("variable2", top3_var2)
    )
)

However, I’m not sure how to apply the style and loc methods to highlight only the top 3 values in each column individually without affecting the entire row.


Solution

  • As outlined in the comments, there are already some discussions on GitHub regarding adding a loc.body(mask=...) argument suitable for the use-case.

    Until this feature is implemented, you could create a GT (Great Table) object and iteratively use gt.tab_style as follows. This avoids the manual chaining of tab_style calls.

    import polars as pl
    import polars.selectors as cs
    from great_tables import GT, loc, style
    
    df = pl.DataFrame({
        "id": [1, 2, 3, 4, 5],
        "variable1": [15, 25, 5, 10, 20],
        "variable2": [40, 30, 50, 10, 20],
        "variable3": [400, 100, 300, 200, 500]
    })
    
    gt = GT(df)
    for col in df.select(cs.exclude("id")).columns:
        gt = gt.tab_style(
            style.text(weight="bold"),
            loc.body(col, pl.col(col).is_in(pl.col(col).top_k(3)))
        )
    
    gt
    

    GreatTable