My data frame:
data = {'Col1': ['Bad Homburg', 'Bischofferode', 'Essen', 'Grabfeld OT Rentwertshausen','Großkrotzenburg','Jesewitz/Weg','Kirchen (Sieg)','Laudenbach a. M.','Nachrodt-Wiblingwerde','Rehburg-Loccum','Dingen','Burg (Dithmarschen)'],
'Col2': ['Rehburg-Loccum','Grabfeld','Laudenbach','Kirchen','Jesewitz','Großkrotzenburg','Nachrodt-','Essen/Stadt','Bischofferode','Bad Homburg','Münster','Burg']}
df = pd.DataFrame(data)
I have two columns in my df as below:
col1 | col2 |
---|---|
Bad Homburg | Rehburg-Loccum |
Bischofferode | Grabfeld |
Essen | Laudenbach |
Grabfeld OT Rentwertshausen | Kirchen |
Großkrotzenburg | Jesewitz |
Jesewitz/Weg | Großkrotzenburg |
Kirchen (Sieg) | Nachrodt- |
Laudenbach a. M. | Essen/Stadt |
Nachrodt-Wiblingwerde | Bischofferode |
Rehburg-Loccum | Bad Homburg |
Dingen | Münster |
Burg (Dithmarschen) | Burg |
I would like to look up col1 data in Col2. If the item is present I would like to write in the same row under the column Lookup_Value and I also Comment percentage of Matching. Below is my expected result:
col1 | col2 | Lookup_value | Comment |
---|---|---|---|
Bad Homburg | Rehburg-Loccum | Bad Homburg | 100% Matched |
Bischofferode | Grabfeld | Bischofferode | 100% Matched |
Essen | Laudenbach a. M. | Essen/Stadt | Best Possible Match |
Grabfeld OT Rentwertshausen | Kirchen | Grabfeld | Best Possible Match |
Großkrotzenburg | Jesewitz | Großkrotzenburg | 100% Matched |
Jesewitz/Weg | Großkrotzenburg | Jesewitz | Best Possible Match |
Kirchen (Sieg) | Nachrodt- | Kirchen | Best Possible Match |
Laudenbach | Essen/Stadt | Laudenbach a. M. | Best Possible Match |
Nachrodt-Wiblingwerde | Bischofferode | Nachrodt- | Best Possible Match |
Rehburg-Loccum | Bad Homburg | Rehburg-Loccum | 100% Matched |
Dingen | Münster | No Match | |
Burg (Dithmarschen) | Burg | Burg | Best Possible Match |
I am trying this way but not working:
def lookup_value_and_comment(row):
col1_value = row['Col1']
col2_value = row['Col2']
if col1_value in col2_value:
if col1_value == col2_value:
return pd.Series([col1_value, '100% Matched'], index=['Lookup_value', 'Comment'])
else:
return pd.Series([col2_value, 'Best Possible Match'], index=['Lookup_value', 'Comment'])
else:
return pd.Series(['', 'No Match'], index=['Lookup_value', 'Comment'])
df[['Lookup_value', 'Comment']] = df.apply(lookup_value_and_comment, axis=1)
print(df)
Here col1_value == col2_value
you check the current row col1_value
with the current row col2_value
, but you need to check all the rows of column col2.
Since the rows are not all the same, I applied word division of columns col1, col2 by delimiters: -, /, and space
into the list creating columns col3, col4
.
To access rows and split them, str is used, filtering is done using isin.
import pandas as pd
df['col3'] = df['col1'].str.split('-|/| ').str[0]
df['col4'] = df['col2'].str.split('-|/| ').str[0]
def lookup_value_and_comment(row):
col1_value = row['col1']
col3_value = row['col3']
ind = df['col4'].isin([col3_value])
if (df['col2'].isin([col1_value])).any():
return pd.Series([col1_value, '100% Matched'],
index=['Lookup_value', 'Comment'])
elif ind.any():
return pd.Series([df.loc[ind, 'col2'].values[0],
'Best Possible Match'], index=['Lookup_value', 'Comment'])
else:
return pd.Series(['', 'No Match'], index=['Lookup_value', 'Comment'])
df[['Lookup_value', 'Comment']] = df.apply(lookup_value_and_comment, axis=1)
Output:
col1 col2 col3 col4 Lookup_value Comment
0 Bad Homburg Rehburg-Loccum [Bad, Homburg] [Rehburg, Loccum] Bad Homburg 100% Matched
1 Bischofferode Grabfeld [Bischofferode] [Grabfeld] Bischofferode 100% Matched
2 Essen Laudenbach a. M. [Essen] [Laudenbach, a., M.] Essen/Stadt Best Possible Match
3 Grabfeld OT Rentwertshausen Kirchen [Grabfeld, OT, Rentwertshausen] [Kirchen] Grabfeld Best Possible Match
4 Großkrotzenburg Jesewitz [Großkrotzenburg] [Jesewitz] Großkrotzenburg 100% Matched
5 Jesewitz/Weg Großkrotzenburg [Jesewitz, Weg] [Großkrotzenburg] Jesewitz Best Possible Match
6 Kirchen (Sieg) Nachrodt- [Kirchen, (Sieg)] [Nachrodt, ] Kirchen Best Possible Match
7 Laudenbach Essen/Stadt [Laudenbach] [Essen, Stadt] Laudenbach a. M. Best Possible Match
8 Nachrodt-Wiblingwerde Bischofferode [Nachrodt, Wiblingwerde] [Bischofferode] Nachrodt- Best Possible Match
9 Rehburg-Loccum Bad Homburg [Rehburg, Loccum] [Bad, Homburg] Rehburg-Loccum 100% Matched
10 Dingen Münster [Dingen] [Münster] No Match
11 Burg (Dithmarschen) Burg [Burg, (Dithmarschen)] [Burg] Burg Best Possible Match