I have a benchmark df_1:
Col_1 insight_id Col_2 Col_n
24249 ABC123 656 AAA
24249 ABC123 670 AXA
22549 ABC124 656 AAC
24249 ABC124 656 ADA
24236 ABC125 656 AAA
And a dataset df_2:
Col_a insight_id Col_b Col_x
24299 ABC123 956 XAA
24299 ABC123 970 AXX
24299 ABC125 954 AAX
24299 ABC125 956 AXX
How do I mark the insight_id
s that are not present in the second dataset? I know about:
df_1.loc[df_1['insight_id'].isin(df_2['insight_id'])]
But it doesn't lead to my expected output, which, in this case is:
insight_id
ABC124
You can negate the condition:
cond = df_1["insight_id"].isin(df_2["insight_id"])
df_1.loc[~cond, "insight_id"].drop_duplicates()