I'm trying to apply Pandas style to my dataset and add a column with a string with the matching result.
This is what I want to achieve: Link
Below is my code, an expert from stackflow assisted me to apply the df.style so I believe for the df.style is correct based on my test. However, how can I run iterrows() and check the cell for each column and return/store a string to the new column 'check'? Thank you so much. I'm trying to debug but not able to display what I want.
df = pd.DataFrame([[10,3,1], [3,7,2], [2,4,4]], columns=list("ABC"))
df['check'] = None
def highlight(x):
c1 = 'background-color: yellow'
m = pd.concat([(x['A'] > 6), (x['B'] > 2), (x['C'] < 3)], axis=1)
df1 = pd.DataFrame('', index=x.index, columns=x.columns)
return df1.mask(m, c1)
def check(v):
for index, row in v[[A]].iterrows():
if row[A] > 6:
A_check = f'row:{index},' + '{0:.1f}'.format(row[A]) + ">6"
return A_check
for index, row in v[[B]].iterrows():
if row[B] > 2:
B_check = f'row:{index}' + '{0:.1f}'.format(row[B]) + ">2"
return B_check
for index, row in v[[C]].iterrows():
if row[C] < 3:
C_check = f'row:{index}' + '{0:.1f}'.format(row[C]) + "<3"
return C_check
df['check'] = df.apply(lambda v: check(v), axis=1)
df.style.apply(highlight, axis=None)
This is the error message I got:
NameError: name 'A' is not defined
My understanding is that the following produces what you are trying to achieve with the check
function:
def check(v):
row_str = 'row:{}, '.format(v.name)
checks = []
if v['A'] > 6:
checks.append(row_str + '{:.1f}'.format(v['A']) + ">6")
if v['B'] > 2:
checks.append(row_str + '{:.1f}'.format(v['B']) + ">2")
if v['C'] < 3:
checks.append(row_str + '{:.1f}'.format(v['C']) + "<3")
return '\n'.join(checks)
df['check'] = df.apply(check, axis=1)
Result (print(df)
):
A B C check
0 10 3 1 row:0, 10.0>6\nrow:0, 3.0>2\nrow:0, 1.0<3
1 3 7 2 row:1, 7.0>2\nrow:1, 2.0<3
2 2 4 4 row:2, 4.0>2
(Replace \n
with ' '
if you don't want the line breaks in the result.)
The axis=1
option in apply
gives the function check
one row of df
as a Series with the column names of df
as index (-> v
). With v.name
you'll get the corresponding row index. Therefore I don't see the need to use .iter...
. Did I miss something?