If I have 2 Series objects, like so: [0,0,1] [1,0,0] How would I get the intersection and union of the two? They only contain booleans which means they are non-unique values.
I have a large Boolean matrix. I've minhashed it and now I'm trying to find the false positives and negatives which I think means that I have to get the Jaccard similarity for each original pair.
Since you say they are booleans use logical_and
and logical_or
of numpy or &
and |
on series i.e
y1 = pd.Series([1,0,1,0])
y2 = pd.Series([1,0,0,1])
# Numpy approach
intersection = np.logical_and(y1.values, y2.values)
union = np.logical_or(y1.values, y2.values)
intersection.sum() / union.sum()
# 0.33333333333333331
# Pandas approach
sum(y1 & y2) / sum(y1 | y2)
# 0.33333333333333331