I have two dataframes which look something like the following:
issue_df
COL1 COL2 REF
0 Name1 Val1 [REF1, REF2]
1 Name2 Val2 None
2 Name3 Val3 [REF1, REF2, REF3]
3 Name4 Val4 REF2
ref_df
NAME SCORE
0 REF1 100.0
1 REF2 99.0
2 REF3 85.0
I want to iterate through the rows of issue_df
and query the ref_df
for the values in the REF
column and create a new merged data frame like the following output:
merged_df
COL1 COL2 REF REF_INFO
0 Name1 Val1 [REF1, REF2] REF1 : 100.0
REF2 : 99.0
1 Name2 Val2 None None
2 Name3 Val3 [REF1, REF2, REF3] REF1 : 100.0
REF2 : 99.0
REF3 : 85.0
3 Name4 Val4 REF2 REF2 : 99.0
Then ideally I would be able to query this merged_df
for any row where it has a ref where the score is less than a certain amount.
I've tried creating this as a dictionary first, where REF_INFO
includes a column of nested dictionaries and then converting it to a dataframe, but this produces errors and warnings.
Test code I'm using:
import pandas as pd
issue_data = {
'COL1': ['Name1', 'Name2', 'Name3', 'Name4'],
'COL2': ['Val1', 'Val2', 'Val3', 'Val4'],
'REF': [['REF1', 'REF2'], None, ['REF1', 'REF2', 'REF3'], 'REF2']}
ref_data = {
'NAME': ['REF1', 'REF2', 'REF3'],
'SCORE': [100.0, 99.0, 85.0]}
issue_df = pd.DataFrame(issue_data)
ref_df = pd.DataFrame(ref_data)
Any ideas on how to accomplish what I'm going for?
You can explode
on column 'REF'
of issue_df
, then map
the values from ref_df
:
merged_df = issue_df.explode("REF")
merged_df["REF_INFO"] = merged_df["REF"].map(ref_df.set_index("NAME")["SCORE"])
COL1 COL2 REF REF_INFO
0 Name1 Val1 REF1 100.0
0 Name1 Val1 REF2 99.0
1 Name2 Val2 - NaN
2 Name3 Val3 REF1 100.0
2 Name3 Val3 REF2 99.0
2 Name3 Val3 REF3 85.0
3 Name4 Val4 REF2 99.0
Then query merged_df
to get rows less than a certain value:
merged_df.query("REF_INFO < 100")
COL1 COL2 REF REF_INFO
0 Name1 Val1 REF2 99.0
2 Name3 Val3 REF2 99.0
2 Name3 Val3 REF3 85.0
3 Name4 Val4 REF2 99.0
If the query should include other 'REF_INFO'
values for the same original row, you can instead get the index of the items that match the query and use that to filter merged_df
:
f = merged_df.query("REF_INFO < 90").index.values
q = merged_df[merged_df.index.isin(f)]
COL1 COL2 REF REF_INFO
2 Name3 Val3 REF1 100.0
2 Name3 Val3 REF2 99.0
2 Name3 Val3 REF3 85.0