pythonpython-polarsgreat-tables

How to conditionally format data in Great Tables?


I am trying to conditionally format table data using Great Tables but not sure how to do it.

To highlight the color of all those cells (sort of heatmap) whose values is higher than Upper Range column.

Data:

import polars as pl

gt_sample_df = pl.DataFrame({'Test': ['Test A','Test B','Test C','Test D','Test Z','Test E','Test F','Test X',
'Test G','Test H','Test I','Test J'],
 'Lower Range': [35.3,2.5,85.0,0.0,None,3.0,200.0,None,3.0,400.0,None,7.0],
 'Upper Range': [79.5,3.5,150.0,160.0,160.0,5.0,None,200.0,5.0,1000.0,150.0,30.0],
 '2024-11-10': [43.0,3.14,135.82,162.7,None,None,206.0,None,4.76,519.52,134.4,26.88],
 '2024-08-03': [36.0,4.31,152.98,None,175.5,5.94,None,211.0,None,512.08,112.6,22.52],
 '2024-06-17': [47.0,3.38,158.94,None,182.0,4.87,None,229.0,None,550.24,115.3,23.06],
 '2024-02-01': [44.0,3.12,136.84,None,154.1,4.51,None,198.0,None,465.04,86.3,17.26],
 '2023-10-16': [45.0,3.11,140.14,None,162.0,4.6,None,207.0,None,501.44,109.3,21.86],
 '2023-05-15': [42.0,3.8,159.58,None,192.0,5.57,None,234.0,None,597.68,162.1,32.42]})

gt_sample_df

enter image description here

The various date columns in this dataframe gt_sample_df contain the results and I want to compare with the Upper Range and highlight those whose values are higher than Upper Range column. There can be n number of date columns with any date so I can't use static names for columns.

I have tried:

from great_tables import GT, md, style, loc, google_font

(GT(gt_sample_df)
 .tab_style(style=[style.text(color="Navy"), style.fill(color="PaleTurquoise")],
        locations=loc.body(columns=pl.exclude(["Test",'Lower Range','Upper Range']), 
                           rows=pl.col(lambda x:  x) > pl.col('Upper Range')))
        )
from great_tables import GT, md, style, loc, google_font

(GT(gt_sample_df)
 .tab_style(style=[style.text(color="Navy"), style.fill(color="red")],
        locations=loc.body(columns=[3:], 
                           rows=pl.col(lambda x:  x) > pl.col('Upper Range')))
        )

As I only want to highlight values greater in the date columns so I was trying to exclude first 3 columns in the column selection but it didn't work and I am not sure how to automatically compare values of all other Date columns to Upper Range column.

Update: Column selection I am able to do it but not able to select proper rows

columns_required = gt_sample_df.select(pl.exclude(["Test",'Lower Range','Upper Range'])).columns

(GT(gt_sample_df)
 .tab_style(style=[style.text(color="Navy"), style.fill(color="PaleTurquoise")],
        
        locations=loc.body(columns=columns_required, rows=pl.col(lambda x:  x) > pl.col('Upper Range')))
)
import polars.selectors as cs

(GT(gt_sample_df)
 .tab_style(style=[style.text(color="Navy"), style.fill(color="PaleTurquoise")],
        
        locations=loc.body(columns=cs.starts_with("20"), rows=pl.col(lambda x:  x) > pl.col('Upper Range')))
)
columns_required = gt_sample_df.select(pl.exclude(["Test",'Lower Range','Upper Range'])).columns

(GT(gt_sample_df)
 .tab_style(style=[style.text(color="Navy"), style.fill(color="PaleTurquoise")],
        for col_name in columns_required:
        
            locations=loc.body(columns=[col_name], 
                            rows=pl.col(col_name) > pl.col('Upper Range')))
)

this also didn't work.

Desired Output probably something like this:

enter image description here


Solution

  • You were pretty close in your last snippet! For the locations argument to accept a list, it needs to be done in a list comprehension, or ahead of time, outside the tab_style method call.

    Link to an example of this in the docs can be found here

    import polars as pl
    from great_tables import GT, loc, style
    
    # define `gt_sample_df` as per example snippet
    
    required_columns = gt_sample_df.drop("Test", "Lower Range", "Upper Range").columns
    
    (
        GT(gt_sample_df)
        # `gt_sample_df.style` works here too
        .tab_style(
            style=[style.text(color="Navy"), style.fill(color="PaleTurquoise")],
            locations=[
                loc.body(columns=col_name, rows=pl.col(col_name) > pl.col("Upper Range"))
                for col_name in required_columns
            ],
        )
    )
    

    great table output

    Edit: See this answer and this discussion on github for related solutions/discussions.

    Disclaimer: I am no great tables expert, so it may be possible there is a better way to do this