pythonpandasdataframedata-manipulation

Problem with mapping codes to county names in Python | Pandas


I have a dataset containing almost 400 counties from Poland. I need to merge several files containing data related to these counties. Each county is identified by its name and a code. However, there's a challenge: there are 10 pairs of counties with the same name. For example, there is a county called 'powiat brzeski' in both the Małopolskie and Opolskie provinces. To differentiate between these counties, I want to add the name of the province next to the county name using Python. I attempted to do this with the below method, but encountered an issue: after the manipulation, the values in the 'County' column, except for those affected by the manipulation, turned into NaN. Could you assist me with resolving this issue?

  import pandas as pd

  # Sample DataFrame
data = {'Code': [1202000, 2402000, 802000, 3017000, 3005000, 9999999], # Added an unmatched code 9999999
        'County': ['Powiat brzeski', 'Powiat bielski', 'Powiat krośnieński', 'Powiat ostrowski', 'Powiat grodziski', 'Powiat ciechanowski']} # Added an unmatched county
df = pd.DataFrame(data)

 
# Print updated DataFrame
print(df)

Result:

     Code               County
0  1202000       Powiat brzeski
1  2402000       Powiat bielski
2   802000   Powiat krośnieński
3  3017000     Powiat ostrowski
4  3005000     Powiat grodziski
5  9999999  Powiat ciechanowski


data = {'Code': [1202000, 2402000, 802000, 3017000, 3005000, 9999999], # Added an unmatched code 9999999
        'County': ['Powiat brzeski', 'Powiat bielski', 'Powiat krośnieński', 'Powiat ostrowski', 'Powiat grodziski', 'Powiat ciechanowski']} # Added an unmatched county
df = pd.DataFrame(data)

# Dictionary mapping codes to county names
code_to_county = {
    1202000: "Powiat brzeski_Malopolskie",
    2402000: "Powiat bielski_Slaskie",
    802000: "Powiat krośnieński_Lubuskie",
    3017000: "Powiat ostrowski_Wielkopolskie",
    3005000: "Powiat grodziski_Wielkopolskie"
}

# Update values in "County" column based on values in "Code" column
df['County'] = df['Code'].map(code_to_county)

# Print updated DataFrame
print(df)

Result:

      Code                          County
0  1202000      Powiat brzeski_Malopolskie
1  2402000          Powiat bielski_Slaskie
2   802000     Powiat krośnieński_Lubuskie
3  3017000  Powiat ostrowski_Wielkopolskie
4  3005000  Powiat grodziski_Wielkopolskie
5  9999999                             NaN

Solution

  • You can use combine_first:

    In [79]: df['County'] = df['Code'].map(code_to_county).combine_first(df['County'])
    
    In [80]: df
    Out[80]: 
          Code                          County
    0  1202000      Powiat brzeski_Malopolskie
    1  2402000          Powiat bielski_Slaskie
    2   802000     Powiat krośnieński_Lubuskie
    3  3017000  Powiat ostrowski_Wielkopolskie
    4  3005000  Powiat grodziski_Wielkopolskie
    5  9999999             Powiat ciechanowski