pandasdataframelistmatchisin

How to check if a pandas data frame column contains any value from a list and return that value


I have list of countries:

countries = ["Afghanistan", "Albania",  "Algeria",  "Andorra",  "Angola",   "Antigua and Barbuda",  "Argentina",    "Armenia",  "Austria",  "Azerbaijan",   "Bahrain",  "Bangladesh",   "Barbados", "Belarus",  "Belgium",  "Belize",   "Benin",    "Bhutan",   "Bolivia",  "Bosnia and Herzegovina",   "Botswana", "Brazil",   "Brunei",   "Bulgaria", "Burkina Faso", "Burundi",  "Cabo Verde",   "Cambodia", "Cameroon", "Canada",   "Central African Republic", "Chad", "Channel Islands",  "Chile",    "China",    "Colombia", "Comoros",  "Congo",    "Costa Rica",   "Côte d'Ivoire",    "Croatia",  "Cuba", "Cyprus",   "Czech Republic",   "Denmark",  "Djibouti", "Dominica", "Dominican Republic",   "DR Congo", "Ecuador",  "Egypt",    "El Salvador",  "Equatorial Guinea",    "Eritrea",  "Estonia",  "Eswatini", "Ethiopia", "Faeroe Islands",   "Finland",  "France",   "French Guiana",    "Gabon",    "Gambia",   "Georgia",  "Germany",  "Ghana",    "Gibraltar",    "Greece",   "Grenada",  "Guatemala",    "Guinea",   "Guinea-Bissau",    "Guyana",   "Haiti",    "Holy See", "Honduras", "Hong Kong",    "Hungary",  "Iceland",  "India",    "Indonesia",    "Iran", "Iraq", "Ireland",  "Isle of Man",  "Israel",   "Italy",    "Jamaica",  "Japan",    "Jordan",   "Kazakhstan",   "Kenya",    "Kuwait",   "Kyrgyzstan",   "Laos", "Latvia",   "Lebanon",  "Lesotho",  "Liberia",  "Libya",    "Liechtenstein",    "Lithuania",    "Luxembourg",   "Macao",    "Madagascar",   "Malawi",   "Malaysia", "Maldives", "Mali", "Malta",    "Mauritania",   "Mauritius",    "Mayotte",  "Mexico",   "Moldova",  "Monaco",   "Mongolia", "Montenegro",   "Morocco",  "Mozambique",   "Myanmar",  "Namibia",  "Nepal",    "Netherlands",  "Nicaragua",    "Niger",    "Nigeria",  "North Korea",  "North Macedonia",  "Norway",   "Oman", "Pakistan", "Panama",   "Paraguay", "Peru", "Philippines",  "Poland",   "Portugal", "Qatar",    "Réunion",  "Romania",  "Russia",   "Rwanda",   "Saint Helena", "Saint Kitts and Nevis",    "Saint Lucia",  "Saint Vincent and the Grenadines", "San Marino",   "Sao Tome & Principe",  "Saudi Arabia", "Senegal",  "Serbia",   "Seychelles",   "Sierra Leone", "Singapore",    "Slovakia", "Slovenia", "Somalia",  "South Africa", "South Korea",  "South Sudan",  "Spain",    "Sri Lanka",    "State of Palestine",   "Sudan",    "Suriname", "Sweden",   "Switzerland",  "Syria",    "Taiwan",   "Tajikistan",   "Tanzania", "Thailand", "The Bahamas",  "Timor-Leste",  "Togo", "Trinidad and Tobago",  "Tunisia",  "Turkey",   "Turkmenistan", "Uganda",   "Ukraine",  "United Arab Emirates", "United Kingdom",   "United States",    "Uruguay",  "Uzbekistan",   "Venezuela",    "Vietnam",  "Western Sahara",   "Yemen",    "Zambia",   "Zimbabwe"]

I also have this pandas dataframe:

import pandas as pd
import numpy as np

ds1 = {'remarks':["DOB 21 Mar 1974; POB Baghdad, Iraq.","DOB 26 Mar 1969; POB Tunis, Tunisia; Italian Fiscal Code TLLLHR69C26Z352G.","DOB 10 Jun 1970; POB Tunis, Tunisia; nationality Tunisia; Passport L550681 issued 23 Sep 1997 expires 22 Sep 2002; Italian Fiscal Code WDDHBB70H10Z352O."], "Latitude" : [-23.69057,-23.41165,-23.51482]}
df1 = pd.DataFrame(data=ds1)

The dataframe looks like this:

print(df1)

enter image description here

I need to:

From the example above, the resulting dataframe would look like this:

enter image description here

Can anyone help me please?


Solution

  • The power of str.extract() really shines for problems like these

    df1['country'] = df1['remarks'].str.extract(("(" + "|".join(countries) +")"), expand=False)
    print(df1)
    
                                                 remarks  Latitude  country
    0                DOB 21 Mar 1974; POB Baghdad, Iraq. -23.69057     Iraq
    1  DOB 26 Mar 1969; POB Tunis, FakeCountry; Itali... -23.41165      NaN #I replaced the Country here to show a test case
    2  DOB 10 Jun 1970; POB Tunis, Tunisia; nationali... -23.51482  Tunisia