[SOLVED] UnionFind with weighted Quick Union (a.k.a Union by rank)

UnionFind with weighted Quick Union (a.k.a Union by rank)

In the Union Find algorithm provided by LeetCode's tutorial, it seems the weight (rank) of the tree is only incremented when merging two equal weight subsets, however not the case when one is greater or less than the other. Can someone explain why?

Please refer to the algorithm below:

private int[] rank;
public void union(int x, int y){
        int rootx = find(x);
        int rooty = find(y);
        if(rootx != rooty){

            if(rank[rootx] > rank[rooty]){
                //notice we don't increment the rank weight of rank[rootx] here
                root[rooty] = rootx; 
            }else if(rank[rootx] < rank[rooty]){
                root[rootx] = rooty;
            }else{
                //only if weights are the same, it is incremented. Why?
                root[rooty] = rootx;
                rank[rootx] += 1;  
            }
            count--;
        }
    }

Solution

The union-find data structure describes a forest, i.e. a collection of disjoint trees. The union operation is then meant to merge two trees together (if the two vertices are not already part of the same tree), and this is done by making the root of one tree into a child of the root of the other tree.

The performance of the find algorithm depends not on the cardinality of a tree (i.e. the number of vertices), but on the height of the tree (i.e. the maximum length of a path from a vertex to the root), since that is the greatest number of iterations required to traverse from a vertex to the root of the tree containing that vertex. So the weight or rank of a tree is its height. There are two cases:

If the tree heights are h1 < h2, then the tree with of height h1 is merged into the tree of height h2. The height of the resulting tree is h2 (if this isn't obvious, think about it), so the rank of the root of that tree is unchanged.
If the tree heights are equal (both h), then after adding the root of one as a child of the root of the other, the height of the new tree will be h + 1, so the rank of the root of the tree is incremented by 1.

This isn't the way it has to be, it's just the way it is in the data structure from your example. For example, if you use the cardinality instead of the height, then you would initialise all the ranks in the array as 1, and when merging two trees, you would do rank[root2] += rank[root1]; regardless of whether the cardinalities were equal or not.

Note also that if the data structure uses path compression, then the rank of a tree will not generally equal its height; it will be an upper bound for the height, because path compression can reduce the height of a tree but there is no efficient way to then update the ranks of all the vertices in the tree. See this other Q&A for a discussion.