I am trying to validate a dataframe column df['Postcode']
containing Scottish postcodes. I have two CSVs containing almost all possible Scottish postcodes (small_df
or large_df
), and I wish to loop through these and (for now) return the postcodes in my original dataframe that do not match any of the entries in these CSVs.
The data in each dataframe (simplified below) is a UK postal code strip of spaces, e.g. PA29DE, of string type.
Case | Postcode |
---|---|
1 | PA29DE |
2 | PH29AD |
3 | nan |
4 | KW102ZE |
5 | KW123DE |
I am using the following loop to do this, but it simply returns a list of all the entries in df['Postcode']
.
for i in df['Postcode']:
if i not in small_df['Postcode'] or large_df['Postcode']:
print(i)
I was expecting only the entries in df
which are not in small_df
or large_df
. I'm really not sure how to proceed from here, and I can't find any other solutions which work.
You did an error in your code, the 'or' is to test 2 conditions but large_df['Postcode']
is not a condition so you have to replace it:
if i not in small_df['Postcode'] and i not in large_df['Postcode']:
print(i)