pythonpandas

How can I create a column that is the result of replacing values in two or more other columns?


Consider the dataframe df:

import pandas as pd

df = pd.DataFrame({'Event1': ['Music', 'Something else 1', 'Theatre', 'Comedy'],
               'Event2': ['Something else 2', 'Ballet', 'Something else 3', 'Something else 4'],
               'Cost': [10000, 5000, 15000, 2000]})
print(df)

             Event1            Event2   Cost
0             Music  Something else 2  10000
1  Something else 1            Ballet   5000
2           Theatre  Something else 3  15000
3            Comedy                 4   2000

I would like to map the values of the Event1 and Event2 to the values in the respective dictionaries:

 # Mapping tables
 dict1 = {'Music': 'M', 'Cooking': 'C', 'Theatre': 'T', 'Comedy': 'C'}
 dict2 = {'Ballet': 'B', 'Swimming': 'S'}

And store the mappings in a common column because I know that per row, only the value of one column will be mapped. The end result would be:

# desired outcome
result = pd.DataFrame({'Event1': ['Music', 'Something else 1', 'Theatre', 'Comedy'],
                   'Event2': ['Something else 2', 'Ballet', 'Something else 3', '4'],
                   'Event': ['M', 'B', 'T', 'C'],
                   'Cost': [10000, 5000, 15000, 2000]})

print(result)

              Event1            Event2 Event   Cost
0             Music  Something else 2     M  10000
1  Something else 1            Ballet     B   5000
2           Theatre  Something else 3     T  15000
3            Comedy                 4     C   2000

I can only do this in a messy and lengthy way and was hoping there is clean maybe idiomatic way of doing this.

How would you advise doing it?


Solution

  • You can combine map and fillna:

    df['Event'] = df['Event1'].map(dict1).fillna(df['Event2'].map(dict2))
    

    Output:

                 Event1            Event2   Cost Event
    0             Music  Something else 2  10000     M
    1  Something else 1            Ballet   5000     B
    2           Theatre  Something else 3  15000     T
    3            Comedy  Something else 4   2000     C
    

    If you imagine a more complex input with an arbitrary number of columns, you could generalize with a dictionary to pair the columns/dictionaries, then:

    dict3 = {'Comedy': 'C'}
    dicts = {'Event1': dict1, 'Event2': dict2, 'Event3': dict3}
    
    df['Event'] = (df[list(dicts)].apply(lambda s: s.map(dicts[s.name]))
                   .bfill(axis=1).iloc[:, 0]
                  )
    

    Output:

                 Event1            Event2          Event3   Cost Event
    0             Music  Something else 2  Something else  10000     M
    1  Something else 1            Ballet  Something else   5000     B
    2           Theatre  Something else 3  Something else  15000     T
    3    Something else  Something else 4          Comedy   2000     C
    

    The above approaches suppose that the mappings are per column, if you want global mappings, you can merge the dictionaries and use:

    from collections import ChainMap
    
    df['Event'] = (df.filter(like='Event').stack()
                     .map(dict(ChainMap(dict1, dict2, dict3)))
                     .groupby(level=0).first()
                  )
    
    # or
    from collections import ChainMap
    
    df['Event'] = (df.filter(like='Event')
                     .map(dict(ChainMap(dict1, dict2, dict3)).get)
                     .bfill(axis=1).iloc[:, 0]
                  )