pythonpandasdataframeisin

Flag column values that are not present in another dataframe


I have a benchmark df_1:

Col_1   insight_id    Col_2    Col_n
24249       ABC123      656      AAA
24249       ABC123      670      AXA
22549       ABC124      656      AAC
24249       ABC124      656      ADA
24236       ABC125      656      AAA

And a dataset df_2:

Col_a   insight_id    Col_b    Col_x
24299       ABC123      956      XAA
24299       ABC123      970      AXX
24299       ABC125      954      AAX
24299       ABC125      956      AXX

How do I mark the insight_ids that are not present in the second dataset? I know about:

df_1.loc[df_1['insight_id'].isin(df_2['insight_id'])]

But it doesn't lead to my expected output, which, in this case is:

insight_id
    ABC124

Solution

  • You can negate the condition:

    cond = df_1["insight_id"].isin(df_2["insight_id"])
    df_1.loc[~cond, "insight_id"].drop_duplicates()