pythonpandaspandas-styles

Pandas add a new column with a string where the cell match a particular condition


I'm trying to apply Pandas style to my dataset and add a column with a string with the matching result.

This is what I want to achieve: Link

enter image description here

Below is my code, an expert from stackflow assisted me to apply the df.style so I believe for the df.style is correct based on my test. However, how can I run iterrows() and check the cell for each column and return/store a string to the new column 'check'? Thank you so much. I'm trying to debug but not able to display what I want.

df = pd.DataFrame([[10,3,1], [3,7,2], [2,4,4]], columns=list("ABC"))

df['check'] = None

def highlight(x):
    c1 = 'background-color: yellow'
    m = pd.concat([(x['A'] > 6), (x['B'] > 2), (x['C'] < 3)], axis=1)
    df1 = pd.DataFrame('', index=x.index, columns=x.columns)
    return df1.mask(m, c1)

def check(v):

    for index, row in v[[A]].iterrows():
        if row[A] > 6: 
            A_check = f'row:{index},' + '{0:.1f}'.format(row[A]) + ">6"
            return A_check

    for index, row in v[[B]].iterrows():
        if row[B] > 2:
            B_check = f'row:{index}' + '{0:.1f}'.format(row[B]) + ">2"
            return B_check

    for index, row in v[[C]].iterrows():
        if row[C] < 3:
            C_check = f'row:{index}' + '{0:.1f}'.format(row[C]) + "<3"
            return C_check


df['check'] = df.apply(lambda v: check(v), axis=1)

df.style.apply(highlight, axis=None)

This is the error message I got:

NameError: name 'A' is not defined


Solution

  • My understanding is that the following produces what you are trying to achieve with the check function:

    def check(v):
        row_str = 'row:{}, '.format(v.name)
        checks = []
        if v['A'] > 6: 
            checks.append(row_str + '{:.1f}'.format(v['A']) + ">6")
        if v['B'] > 2:
            checks.append(row_str + '{:.1f}'.format(v['B']) + ">2")
        if v['C'] < 3:
            checks.append(row_str + '{:.1f}'.format(v['C']) + "<3")    
        return '\n'.join(checks)
    
    df['check'] = df.apply(check, axis=1)
    

    Result (print(df)):

        A  B  C                                      check
    0  10  3  1  row:0, 10.0>6\nrow:0, 3.0>2\nrow:0, 1.0<3
    1   3  7  2                 row:1, 7.0>2\nrow:1, 2.0<3
    2   2  4  4                               row:2, 4.0>2
    

    (Replace \n with ' ' if you don't want the line breaks in the result.)

    The axis=1 option in apply gives the function check one row of df as a Series with the column names of df as index (-> v). With v.name you'll get the corresponding row index. Therefore I don't see the need to use .iter.... Did I miss something?