Suppose I have a dataframe of names and countries:
ID FirstName LastName Country
1 Paulo Cortez Brasil
2 Paulo Cortez Brasil
3 Paulo Cortez Espanha
4 Maria Lurdes Espanha
5 Maria Lurdes Espanha
6 John Page USA
7 Felipe Cardoso Brasil
8 John Page USA
9 Felipe Cardoso Espanha
10 Steve Xis UK
I need a way to identify all people that have the same firstname and lastname that appears more than once in the dataframe but at least one of the records appears belonging to another country and return all duplicated rows. This way resulting in this dataframe:
ID FirstName LastName Country
1 Paulo Cortez Brasil
2 Paulo Cortez Brasil
3 Paulo Cortez Espanha
7 Felipe Cardoso Brasil
9 Felipe Cardoso Espanha
What would be the best way to achieve it?
Use boolean indexing:
# is the name present in several countries?
m = df.groupby(['FirstName', 'LastName'])['Country'].transform('nunique').gt(1)
out = df.loc[m]
Output:
ID FirstName LastName Country
0 1 Paulo Cortez Brasil
1 2 Paulo Cortez Brasil
2 3 Paulo Cortez Espanha
6 7 Felipe Cardoso Brasil
8 9 Felipe Cardoso Espanha