algorithmdata-structuresgraphsearch-treetreap

Using "Treap" to compare two set


I want to use Treap structure,but I don't familiar with this type of tree well.

I have two set and i want to write a method to compare them with Treap. This method should return a value that show the similarity of two set. (My work is retrieve a set that is mostly similar to an input set)

How can i do this work?

Thanks


Solution

  • Treap

    A Treap is an example of a Balanced Binary Search Tree (you can use any of them for this problem). The expected height of a Treap containing n elements is O(logn) - expected, because Treap is a randomized data structure.

    The following solution works for any Binary Search Tree, but it performs much better if the Balanced Binary Search Tree is used (e.g. Treap).

    Measure

    One measure of a similarity between two sets is the Jaccard Index. Let's call our sets A and B. The Jaccard Index is defined by:

    enter image description here

    So to compute the Jaccard Index of A and B, we must compute the sum and the intersection of A and B.

    Operations

    Let's assume that A and B are implemented as Balanced Binary Search Trees.

    A Binary Search Tree could support many operations, but three of them are sufficient for this problem:

    In the Balanced Binary Search Tree, find(x) and insert(x) have O(logn) running time, where n is the number of elements in the Tree.

    In addition, during insertion, we can keep track of the size of the Tree, so size() can be implemented in a constant time.

    Of course, we could iterate over all elements of our Tree.

    Pseudocode

    Step 1.

    sum(A, B):
    
        C = A 
    
        foreach x in B:
            C.insert(x)
    
        return C
    

    Step 2.

    intersection(A, B):
    
        C = new BalancedBinarySearchTree()
    
        foreach x in B:
            if(A.find(x) == true):
                C.insert(x)
    
        return C
    

    Step 3.

    Calculate the Jaccard index of A and B:

    JaccardIndex(A, B)
        S = sum(A, B)
        I = intersect(A, B)
    
        return I.size() / S.size()
    

    Complexity

    Let's assume that:

    n = A.size()
    m = B.size()
    

    Then the complexity of computing the sum is O(n + m * log(n + m)), and the complexity of calculating the intersection is O(m * log n).