I need rows from df1 doesn't exist in df2 based on 3 columns [Time1, ID1, Order1]. I need df3 has rows of df1 don't exist on df2 Note: Time1 is in datetime format,
Example
input
df1
Time1 | ID1 | Order1 |
---|---|---|
12/14/2022 6:10:32 PM | X | A |
9/15/2022 2:45:57 AM | Y | B |
9/15/2022 11:08:26 AM | Z | C |
df2
Time2 | ID2 | Order2 |
---|---|---|
12/14/2022 6:15:35 PM | X | A |
12/14/2022 6:00:35 PM | Y | B |
9/15/2022 2:45:57 AM | Y | B |
output df3
Time1 | ID1 | Order1 |
---|---|---|
12/14/2022 6:10:32 PM | X | A |
9/15/2022 11:08:26 AM | Z | C |
Use a merge
with indicator and indexing:
df3 = df1.loc[df1.merge(df2, left_on=['Time1', 'ID1', 'Order1'],
right_on=['Time2', 'ID2', 'Order2'],
how='left', indicator=True)
.query('_merge == "left_only"').index
]
Or:
df3 = df1.loc[df1.merge(df2, left_on=['Time1', 'ID1', 'Order1'],
right_on=['Time2', 'ID2', 'Order2'],
how='left', indicator=True)
['_merge'].eq('left_only')
]
Output:
Time1 ID1 Order1
0 12/14/2022 6:10:32 PM X A
2 9/15/2022 11:08:26 AM Z C