I am trying to conditionally format table data using Great Tables but not sure how to do it.
To highlight the color of all those cells (sort of heatmap) whose values is higher than Upper Range column.
Data:
import polars as pl
gt_sample_df = pl.DataFrame({'Test': ['Test A','Test B','Test C','Test D','Test Z','Test E','Test F','Test X',
'Test G','Test H','Test I','Test J'],
'Lower Range': [35.3,2.5,85.0,0.0,None,3.0,200.0,None,3.0,400.0,None,7.0],
'Upper Range': [79.5,3.5,150.0,160.0,160.0,5.0,None,200.0,5.0,1000.0,150.0,30.0],
'2024-11-10': [43.0,3.14,135.82,162.7,None,None,206.0,None,4.76,519.52,134.4,26.88],
'2024-08-03': [36.0,4.31,152.98,None,175.5,5.94,None,211.0,None,512.08,112.6,22.52],
'2024-06-17': [47.0,3.38,158.94,None,182.0,4.87,None,229.0,None,550.24,115.3,23.06],
'2024-02-01': [44.0,3.12,136.84,None,154.1,4.51,None,198.0,None,465.04,86.3,17.26],
'2023-10-16': [45.0,3.11,140.14,None,162.0,4.6,None,207.0,None,501.44,109.3,21.86],
'2023-05-15': [42.0,3.8,159.58,None,192.0,5.57,None,234.0,None,597.68,162.1,32.42]})
gt_sample_df
The various date columns in this dataframe gt_sample_df
contain the results and I want to compare with the Upper Range and highlight those whose values are higher than Upper Range column. There can be n number of date columns with any date so I can't use static names for columns.
I have tried:
from great_tables import GT, md, style, loc, google_font
(GT(gt_sample_df)
.tab_style(style=[style.text(color="Navy"), style.fill(color="PaleTurquoise")],
locations=loc.body(columns=pl.exclude(["Test",'Lower Range','Upper Range']),
rows=pl.col(lambda x: x) > pl.col('Upper Range')))
)
from great_tables import GT, md, style, loc, google_font
(GT(gt_sample_df)
.tab_style(style=[style.text(color="Navy"), style.fill(color="red")],
locations=loc.body(columns=[3:],
rows=pl.col(lambda x: x) > pl.col('Upper Range')))
)
As I only want to highlight values greater in the date columns so I was trying to exclude first 3 columns in the column selection but it didn't work and I am not sure how to automatically compare values of all other Date columns to Upper Range column.
Update: Column selection I am able to do it but not able to select proper rows
columns_required = gt_sample_df.select(pl.exclude(["Test",'Lower Range','Upper Range'])).columns
(GT(gt_sample_df)
.tab_style(style=[style.text(color="Navy"), style.fill(color="PaleTurquoise")],
locations=loc.body(columns=columns_required, rows=pl.col(lambda x: x) > pl.col('Upper Range')))
)
import polars.selectors as cs
(GT(gt_sample_df)
.tab_style(style=[style.text(color="Navy"), style.fill(color="PaleTurquoise")],
locations=loc.body(columns=cs.starts_with("20"), rows=pl.col(lambda x: x) > pl.col('Upper Range')))
)
columns_required = gt_sample_df.select(pl.exclude(["Test",'Lower Range','Upper Range'])).columns
(GT(gt_sample_df)
.tab_style(style=[style.text(color="Navy"), style.fill(color="PaleTurquoise")],
for col_name in columns_required:
locations=loc.body(columns=[col_name],
rows=pl.col(col_name) > pl.col('Upper Range')))
)
this also didn't work.
Desired Output probably something like this:
You were pretty close in your last snippet! For the locations
argument to accept a list, it needs to be done in a list comprehension, or ahead of time, outside the tab_style
method call.
Link to an example of this in the docs can be found here
import polars as pl
from great_tables import GT, loc, style
# define `gt_sample_df` as per example snippet
required_columns = gt_sample_df.drop("Test", "Lower Range", "Upper Range").columns
(
GT(gt_sample_df)
# `gt_sample_df.style` works here too
.tab_style(
style=[style.text(color="Navy"), style.fill(color="PaleTurquoise")],
locations=[
loc.body(columns=col_name, rows=pl.col(col_name) > pl.col("Upper Range"))
for col_name in required_columns
],
)
)
Edit: See this answer and this discussion on github for related solutions/discussions.
Disclaimer: I am no great tables expert, so it may be possible there is a better way to do this