pandasdataframejoinduplicatesouter-join

Merging Pandas Dataframe with no duplicated columns on one key


I am new to Pandas and have tried reviewing different solutions on here but I am having some issues with joining two data frames. They are joining ok, however for some for some columns I would like it to to remain as NULL instead of duplicating

Sample DF1

employee year
x12345 2023
x12345 2022
z12345 2015
z12345 2013

Sample DF2

employee status
x12345 active
z12345 active
m12345 inactive
q12345 active

What I have tried

merged = pd.merge(DF1, DF2, how="outer", on=["employee"]

What I currently have

employee year status
x12345 2023 active
x12345 2022 active
z12345 2015 active
z12345 2013 active

What I need

employee year status
x12345 2023 active
x12345 2022 NULL
z12345 2015 active
z12345 2013 NULL

What I have tried

merged = pd.merge(DF1, DF2, how="outer", on=["employee"]


Solution

  • Code

    df1.assign(key=df1.groupby('employee').cumcount())\
       .merge(df2.assign(key=df2.groupby('employee').cumcount()), how='left')\
       .drop('key', axis=1)
    

    out:

    employee    year    status
    0   x12345  2023    active
    1   x12345  2022    NaN
    2   z12345  2015    active
    3   z12345  2013    NaN