pandasdataframedictionarytagging

Assigning a value to a column based on a mapping defined in a dictionary


I am trying to implement a code that reads a csv file, creates a data frame out of it, and then tags each row the name of the key, if one of the columns in that row contains the same string as in the key of the dataframe.

As an example, I have the following dictionary defined:

Sdiction={
        "Mgage" : ["ABC Gage","XYZ Gage"],
        "Rate" : ["deg/min","rad/s","rpm"]}

And I have the following dataframe:

Col A Col B Col C Col D
1 30 ABC Gage
2 45 deg/min
3 150 Gage

I want to tag Col D for each row as

Row 1 - Col D = Mgage (since ABC Gage exists in the key Mgage)

Row 2 - Col D = Rate (Since deg/min exists in the key Rate)

Row 3 - Col D = Mgage (since the string Gage exists in the key Mgage, albeit partially)

Expected output:

Col A Col B Col C Col D
1 30 ABC Gage Mgage
2 45 deg/min Rate
3 150 Gage Mgage

I am trying to figure out how to implement this part, have not yet implemented it, and therefore need help.


Solution

  • Using a regex match:

    import re
    
    s = df['Col C'].str.casefold()
    pattern = '(%s)' % '|'.join(map(re.escape, s))
    # '(abc\\ gage|deg/min|gage)'
    
    # reverse dictionary
    tmp = pd.Series({v.casefold(): k for k, l in Sdiction.items()
                     for v in l}, name='ref').reset_index()
    
    # extract first match, map reference key
    df['Col D'] = s.map(tmp.assign(match=tmp['index'].str.extract(pattern))
                           .dropna(subset=['match'])
                           .set_index('match')['ref']
                        )
    

    Output:

       Col A  Col B     Col C  Col D
    0      1     30  ABC Gage  Mgage
    1      2     45   deg/min   Rate
    2      3    150      Gage  Mgage