I want to use Treap structure,but I don't familiar with this type of tree well.
I have two set and i want to write a method to compare them with Treap. This method should return a value that show the similarity of two set. (My work is retrieve a set that is mostly similar to an input set)
How can i do this work?
Thanks
A Treap is an example of a Balanced Binary Search Tree (you can use any of them for this problem). The expected height of a Treap containing n elements is O(logn) - expected, because Treap is a randomized data structure.
The following solution works for any Binary Search Tree, but it performs much better if the Balanced Binary Search Tree is used (e.g. Treap).
One measure of a similarity between two sets is the Jaccard Index. Let's call our sets A and B. The Jaccard Index is defined by:
So to compute the Jaccard Index of A and B, we must compute the sum and the intersection of A and B.
Let's assume that A and B are implemented as Balanced Binary Search Trees.
A Binary Search Tree could support many operations, but three of them are sufficient for this problem:
In the Balanced Binary Search Tree, find(x) and insert(x) have O(logn) running time, where n is the number of elements in the Tree.
In addition, during insertion, we can keep track of the size of the Tree, so size() can be implemented in a constant time.
Of course, we could iterate over all elements of our Tree.
Step 1.
sum(A, B):
C = A
foreach x in B:
C.insert(x)
return C
Step 2.
intersection(A, B):
C = new BalancedBinarySearchTree()
foreach x in B:
if(A.find(x) == true):
C.insert(x)
return C
Step 3.
Calculate the Jaccard index of A and B:
JaccardIndex(A, B)
S = sum(A, B)
I = intersect(A, B)
return I.size() / S.size()
Let's assume that:
n = A.size()
m = B.size()
Then the complexity of computing the sum is O(n + m * log(n + m)), and the complexity of calculating the intersection is O(m * log n).