I have a dictionary/country list which i took from pycountry (list(pycountry.countries)
I am importing a column from excel called CountryNames
i thought by creating 2 for loops, i could then compare to see if my CountryNames is found in the PyCountry list
import pandas as pd
from langdetect import detect
from googletrans import Translator
import pycountry
df = pd.read_excel(r"V1.xlsx")
input_countries = df['CountryName']
# Ensure the column 'CountryName' exists in the CSV
if 'Sold To Customer Country' not in df.columns:
raise ValueError("CountryName' column")
t = list(pycountry.countries)
# loop through column of countries
for name in input_countries:
# loop through country name (pycountry)
for pycountry.name in t:
# comparison - Show country names
if input_countries in t:
input_countries
else:
'invalid'enter code here
I want to check to see if a countryName (from the excel column) is in the pycountry list
You need two sets. One is a set formed from all entries in the "CountryName" column in the spreadsheet. The other is a combination of all known country names both in their short forms and "official" formats.
Then you just need a simple loop to validate your data.
Something like this should suffice:
import pycountry
import pandas as pd
from pathlib import Path
FILENAME = "V1.xlsx"
def all_names() -> set[str]:
# all Country objects have a "name" attribute
return {country.name for country in pycountry.countries} # type: ignore
def all_official_names() -> set[str]:
s: set[str] = set()
for country in pycountry.countries:
# not all Country objects have an "official_name" attribute
try:
s.add(country.official_name) # type: ignore
except AttributeError:
pass
return s
def get_df_countries(filename: Path) -> set[str]:
# construct a set because country names may be duplicated in the spreadsheet column
# this potentially improves runtime performance when parsing the country names later
return set(pd.read_excel(filename)["CountryName"])
if __name__ == "__main__":
names = all_names() | all_official_names()
for country in get_df_countries(Path(FILENAME)):
status = "valid" if country in names else "invalid"
print(f"{country} is {status}")
Caveat Emptor
You're limited to 2 forms of country names provided by the pycountry module. Due to typing errors and other subtle variations of country names you may get false negatives from this code