I find this code very interesting. I modified the code a little to improve the question. Essentially, the code uses a DataFrame to format the style of another DataFrame using pd.style
.
t1 = pd.DataFrame({'x':[300,200,700], 'y':[100,300,200]})
t2 = pd.DataFrame({'x':['A','B','C'], 'y':['C','B','D']})
def highlight_cell(val, props=''):
return props if val > 200 else ''
t2.style.apply(lambda x: t1.map(highlight_cell, props='background-color:yellow'), axis=None)
But can anyone explain how the last line works? I couldn't find Pandas documentation that clarifies the behavior of df.map()
inside another df.apply()
.
To me, the code reads like for each item in t1, apply highlight_cell()
to the entire t2 at once, like this pseudocode.
for x in all_items_in_t1:
yield [highlight_cell(y) for y in all_items_in_t2]
However, the output is saying for each item in t1, apply highlight_cell()
only to the corresponding item in t2 that has the same (x, y) location as that item in t1, like this.
for x, y in zip(all_items_in_t1, all_items_in_t2):
yield highlight_cell(y)
I'm still having trouble understanding this pattern because it seems a bit confusing. Can anyone explain it more clearly?
DataFrame.style.apply
is used here, not DataFrame.apply
.
By using the parameter axis=None
, the callable is applied once (not per cell) on the whole DataFrame. Since the callable is a lambda, this essentially means we run:
t1.map(highlight_cell, props='background-color:yellow')
and use the output as format.
x y
0 background-color:yellow
1 background-color:yellow
2 background-color:yellow
Note that using DataFrame.map
here is not needed (and inefficient), better go for a vectorial approach:
t2.style.apply(lambda x: np.where(t1>200, 'background-color:yellow', ''), axis=None)