pythonpandasdataframereplaceswitch-statement

How to replace values in a pandas dataframe column based on a condition


I have a dataframe that looks like this

import pandas as pd
    
all_df = pd.DataFrame(
    [
        ('A','Area1','AA','A B D E','A B','D E'),
        ('B','Area1','AA','A B D E','A B','D E'),
        ('C','Area2','BB','C','C','C'),
        ('E','Area1','CC','A B D E','A B','D E'),
        ('F','Area3','BB','F G','G','F')
    ],
    columns = ['Name','Area','Type','Group','AA members','CC members']
)

    
 Name   Area Type    Group AA members CC members
0    A  Area1   AA  A B D E        A B        D E
1    B  Area1   AA  A B D E        A B        D E
2    C  Area2   BB        C          C          C
3    E  Area1   CC  A B D E        A B        D E
4    F  Area3   BB      F G          G          F

The last row (row 4) is incorrect. Anything that is type BB should only have itself (F) in Group, AA members, CC members. In other words, it should look like this:

4    F  Area3   BB        F          F          F

To do this I tried:

  1. Check when Type is BB and length of Group is 2:

    all_df.loc[all_df['Type']== 'BB']['Group'].str.split().str.len() == 2
    
  2. Iterate over every row and to find the cases like this

  3. Make a new df with all these rows and make the Group, AA members, CC members = Name

  4. Drop the rows where that happens in all_df.

  5. Merge 3 back in to all_df.

Is there a better pandas way to do this?


Solution

  • We can use boolean indexing to replace values here. The key here is that we don't really need to check if the Group value is correct or not; we can simply assign the Name to the latter three columns if Type is BB.

    # identify rows where Type is BB
    m = all_df['Type'] == 'BB'
    # for Type BB rows, replace Group, AA members and CC members values by Name
    all_df.loc[m, ['Group', 'AA members', 'CC members']] = all_df.loc[m, 'Name']
    

    The above filters the rows where Type is BB and replaces the values in the relevant columns. Another way is to use mask to replace values in the relevant columns.

    cols = ['Group', 'AA members', 'CC members']
    all_df[cols] = all_df[cols].mask(all_df['Type'] == 'BB', all_df['Name'], axis=0)
    

    Both return the desired output:

      Name   Area Type    Group AA members CC members
    0    A  Area1   AA  A B D E        A B        D E
    1    B  Area1   AA  A B D E        A B        D E
    2    C  Area2   BB        C          C          C
    3    E  Area1   CC  A B D E        A B        D E
    4    F  Area3   BB        F          F          F