pythonpandasreplacegroup-byconditional-statements

Pandas: replace specific characters in column based on condition using groupby


I have the following dataframe:

df = pd.DataFrame({
'ACCOUNT': [1, 1, 2, 2], 
'ASSET_CLASS': ['11201XX FI CHF', '11201XX FI CHF', 
                '12201XX FI EUR', '12201XX FI EUR',
               ],
})

display(df)

    ACCOUNT ASSET_CLASS     
0   1       11201XX FI CHF  
1   1       11201XX FI CHF  
2   2       12201XX FI EUR  
3   2       12201XX FI EUR  

For a list of accounts, I'd like to replace "XX" in column "ASSET_CLASS" by "01" and "02", respectively using the groupby function and "ACCOUNT" as groupby-variable. In addition, the name of the value should then be adjusted according to the assigned value. For instance, if "01" ("02"), then "FI CHF" equals "FI CHF Gov" ("FI CHF Corporate"). This is how it should look like:

df = pd.DataFrame({
    'ACCOUNT': [1, 1, 2, 2], 
    'ASSET_CLASS': ['1120101 FI CHF Gov', '1120102 FI CHF Corporate',
                    '1220101 FI EUR Gov', '1120102 FI EUR Corporate',
                   ],
})

display(df)

    ACCOUNT ASSET_CLASS
0   1       1120101 FI CHF Gov
1   1       1120102 FI CHF Corporate
2   2       1220101 FI EUR Gov
3   2       1120102 FI EUR Corporate

Thanks a lot in advance for any suggestions!


Solution

  • Code

    s1 = df.groupby('ACCOUNT').cumcount().add(1).astype('str').str.zfill(2)
    m = {'01': ' Gov', '02': ' Corporate'}
    s2 = df['ASSET_CLASS'].str.split('XX')
    df['ASSET_CLASS'] = s2.str[0] + s1 + s2.str[1] + s1.map(m)
    

    df

        ACCOUNT ASSET_CLASS
    0   1       1120101 FI CHF Gov
    1   1       1120102 FI CHF Corporate
    2   2       1220101 FI EUR Gov
    3   2       1220102 FI EUR Corporate
    

    update my answer with additional question

    when account has CHF & EUR both, and you want split them use following code

    example:

    df = pd.DataFrame({
        'ACCOUNT': [1, 1, 1, 1, 2, 2], 
        'ASSET_CLASS': ['11201XX FI CHF', '11201XX FI CHF', 
                        '12201XX FI EUR', '12201XX FI EUR',
                        '12201XX FI EUR', '12201XX FI EUR',
                       ],
    })
    

    code:

    s1 = df.groupby(['ACCOUNT', df['ASSET_CLASS'].str.split(' ').str[-1]]).cumcount()\
           .add(1).astype('str').str.zfill(2)
    m = {'01': ' Gov', '02': ' Corporate'}
    s2 = df['ASSET_CLASS'].str.split('XX')
    df['ASSET_CLASS'] = s2.str[0] + s1 + s2.str[1] + s1.map(m)
    

    df

        ACCOUNT ASSET_CLASS
    0   1       1120101 FI CHF Gov
    1   1       1120102 FI CHF Corporate
    2   1       1220101 FI EUR Gov
    3   1       1220102 FI EUR Corporate
    4   2       1220101 FI EUR Gov
    5   2       1220102 FI EUR Corporate