pythonpandasdictionarypandas-settingwithcopy-warning

Pandas Map Multiple Columns With A Filter


I have a dataframe like so (simplified for this example)

 Site     LocationName    Resource#    
   01        Test Name            5
   01          Testing            6
   02       California           10
   02            Texas           11
   ...

Each site has their own mapping for LocationName and Resource#

For example:

I am trying to map each respective site with their mappings for the different columns. If the mapping does not exist, I want the field to be None/blank.

My ideal output is:

Site#     LocationName    Resource#    
   01     Another Test         5000
   01   RandomLocation
   02           CA-123          10A
   02                           11B

My idea was to filter for each site and run map on the series

df01 = df[df.Site == '01']
df01 = df['LocationName'].map({'Test Name': 'Another Test', 'Testing': 'RandomLocation'})

But this returns SettingWithCopyWarning since I am performing these operations on a copy.

Is there a simple way to achieve this?


Solution

  • Yes there is. You can use the map function within the apply method on your dataframe.

    import pandas as pd
    
    data = {
        'Site': ['01', '01', '02', '02'],
        'LocationName': ['Test Name', 'Testing', 'California', 'Texas'],
        'Resource#': [5, 6, 10, 11]
    }
    
    df = pd.DataFrame(data)
    
    mappings = {
        '01': {
            'LocationName': {'Test Name': 'Another Test', 'Testing': 'RandomLocation'},
            'Resource#': {5: '5000', 6: None}  
        },
        '02': {
            'LocationName': {'California': 'CA-123', 'Texas': None},
            'Resource#': {10: '10A', 11: '11B'}
        }
    }
    
    def apply_mappings(row):
        site = row['Site']
        if site in mappings:
            location_map = mappings[site]['LocationName']
            row['LocationName'] = location_map.get(row['LocationName'])
    
            resource_map = mappings[site]['Resource#']
            row['Resource#'] = resource_map.get(row['Resource#'], None)  
        return row
    
    df = df.apply(apply_mappings, axis=1)
    
    print(df)
    
    

    which gives you your expected output:

      Site    LocationName Resource#
    0   01    Another Test      5000
    1   01  RandomLocation      None
    2   02          CA-123       10A
    3   02            None       11B