pandasdataframeisin

Retrieve rows from dataframe don't exit in another dataframe pandas


I need rows from df1 doesn't exist in df2 based on 3 columns [Time1, ID1, Order1]. I need df3 has rows of df1 don't exist on df2 Note: Time1 is in datetime format,

Example

input

df1

Time1 ID1 Order1
12/14/2022 6:10:32 PM X A
9/15/2022 2:45:57 AM Y B
9/15/2022 11:08:26 AM Z C

df2

Time2 ID2 Order2
12/14/2022 6:15:35 PM X A
12/14/2022 6:00:35 PM Y B
9/15/2022 2:45:57 AM Y B

output df3

Time1 ID1 Order1
12/14/2022 6:10:32 PM X A
9/15/2022 11:08:26 AM Z C

Solution

  • Use a merge with indicator and indexing:

    df3 = df1.loc[df1.merge(df2, left_on=['Time1', 'ID1', 'Order1'],
                                 right_on=['Time2', 'ID2', 'Order2'],
                            how='left', indicator=True)
                     .query('_merge == "left_only"').index
                 ]
    

    Or:

    df3 = df1.loc[df1.merge(df2, left_on=['Time1', 'ID1', 'Order1'],
                                 right_on=['Time2', 'ID2', 'Order2'],
                            how='left', indicator=True)
                     ['_merge'].eq('left_only')
                 ]
    

    Output:

                       Time1 ID1 Order1
    0  12/14/2022 6:10:32 PM   X      A
    2  9/15/2022 11:08:26 AM   Z      C