algorithmunion-find

UnionFind with weighted Quick Union (a.k.a Union by rank)


In the Union Find algorithm provided by LeetCode's tutorial, it seems the weight (rank) of the tree is only incremented when merging two equal weight subsets, however not the case when one is greater or less than the other. Can someone explain why?

Please refer to the algorithm below:

private int[] rank;
public void union(int x, int y){
        int rootx = find(x);
        int rooty = find(y);
        if(rootx != rooty){

            if(rank[rootx] > rank[rooty]){
                //notice we don't increment the rank weight of rank[rootx] here
                root[rooty] = rootx; 
            }else if(rank[rootx] < rank[rooty]){
                root[rootx] = rooty;
            }else{
                //only if weights are the same, it is incremented. Why?
                root[rooty] = rootx;
                rank[rootx] += 1;  
            }
            count--;
        }
    }

Solution

  • The union-find data structure describes a forest, i.e. a collection of disjoint trees. The union operation is then meant to merge two trees together (if the two vertices are not already part of the same tree), and this is done by making the root of one tree into a child of the root of the other tree.

    The performance of the find algorithm depends not on the cardinality of a tree (i.e. the number of vertices), but on the height of the tree (i.e. the maximum length of a path from a vertex to the root), since that is the greatest number of iterations required to traverse from a vertex to the root of the tree containing that vertex. So the weight or rank of a tree is its height. There are two cases:

    This isn't the way it has to be, it's just the way it is in the data structure from your example. For example, if you use the cardinality instead of the height, then you would initialise all the ranks in the array as 1, and when merging two trees, you would do rank[root2] += rank[root1]; regardless of whether the cardinalities were equal or not.

    Note also that if the data structure uses path compression, then the rank of a tree will not generally equal its height; it will be an upper bound for the height, because path compression can reduce the height of a tree but there is no efficient way to then update the ranks of all the vertices in the tree. See this other Q&A for a discussion.