I have a data frame where the columns values are list and want to find the differences between two columns, or in other words I want to find all the elements in column A which is not there in column B.
data={'NAME':['JOHN','MARY','CHARLIE'],
'A':[[1,2,3],[2,3,4],[3,4,5]],
'B':[[2,3,4],[3,4,5],[4,5,6]]}
df=pd.DataFrame(data)
df=df[['NAME','A','B']]
#I'm able to concatenate
df['C']=df['A']+df['B']
NAME A B C
0 JOHN [1, 2, 3] [2, 3, 4] [1, 2, 3, 2, 3, 4]
1 MARY [2, 3, 4] [3, 4, 5] [2, 3, 4, 3, 4, 5]
2 CHARLIE [3, 4, 5] [4, 5, 6] [3, 4, 5, 4, 5, 6]
Any way to find the differences?
df['C']=df['A']-df['B']
I know we can use df.apply
to a function but row by row processing will run slow since I have around 400K rows. I'm looking for a straight forward method like
df['C']=df['A']+df['B']
For a set difference,
df['A'].map(set) - df['B'].map(set)