pythonpandasjoinmerge

Pandas merge one-to-many


I'm trying to merge two pandas DataFrames on multiple columns. It is a many-to-one relationship. There are many of the same values in df1 but only value in df2.

These are the example DataFrames :

df1 = pd.DataFrame({'date':['2025-03-01','2025-03-01', '2025-03-02'],'period':[1,1,2], 'company':['aa','aa','b']})

df2 = pd.DataFrame({'date':['2025-03-01', '2025-03-02'],'period':[1,2],'company':['aa','b'], 'value':[4,8]})

The result I want is this (where the value '4' is only joined once on the first match and subsequent rows are filled with empty space or null):

         date  period company   value
0  2025-03-01       1      aa     4
1  2025-03-01       1      aa    null
2  2025-03-02       2       b     8

I'm trying a left join - but I get the incorrect output:

         date  period company  value
0  2025-03-01       1      aa      4
1  2025-03-01       1      aa      4
2  2025-03-02       2       b      8

Solution

  • merged = df1.merge(df2, on=['date','period','company'], how='left')
    merged.loc[merged.duplicated(subset=['date','period','company']), 'value'] = np.nan
    

    This worked for me, and it would work cleanly with large data sets as well. I replicated the code in google colab and got the following result.

    enter image description here