[SOLVED] Assemble dataframe from separate dfs, selecting particular cells

Assemble dataframe from separate dfs, selecting particular cells

I want to assemble a dataframe from three separate dfs, each of which have the contents of a column. The trick is I want each row of the new df to be dynamically created based on a year offset.

For example, my background theory is that there is a time lag that flows each year from Funding -> Staff -> Output, and I want to assemble the story of each row. Each row should start with a year N's Output, then have year N-2's Funding, then year N-1's Staff.

Desired final combined df

Name	ID	Year_Output	Output	Year_Funding	Funding	Year_Staff	Staff
ABC	1	2023	25	2021	500	2022	18
DEF	2	2022	51	2020	1000	2021	12

Example Output df (others are formatted the same)

Name	ID	Year	Output
ABC	1	2023	25
DEF	2	2022	51

What is the best way to assemble this programmatically in pandas? I've tried creating a new year column for Year_Funding that keys off the Year_Output (always N-2), but then how do I merge the specific ID-Year cell into the correct place?

MWE

import pandas as pd

d = {'Name': ['ABC', 'DEF'],
     'ID': [1, 2],
     'Year': [2023, 2022],
     'Output': [25, 51]}
output_df = pd.DataFrame(data=d)

e = {'Name': ['ABC', 'DEF', 'ABC'],
     'ID': [1, 2, 1],
     'Year': [2021, 2020, 2023],
     'Funding': [500, 1000, 17]}
funding_df = pd.DataFrame(data=e)


combined_df = (?)

edit: df e had Output as column name, changed to Funding

edit 2: clarifying that each separate df can have multiple years of data for an ID

Solution

I was able to get this to work using .apply

def get_Funding_info(row):
    return funding_df.loc[( (funding_df['ID']==row['ID']) & (funding_df['Year']==row['Year_Funding']) ), 'Funding'].squeeze()

combined_df['Funding'] = combined_df.apply(get_Funding_info, axis=1)