algorithmgraphdisjoint-setsunion-find

union by size vs union by rank in disjoint-set algorithm


The question in short: does union by rank work only in some circumstance?

For example, the input is: (1, 2), (3,4), (3,5), after we run a disjoint-set algorithm on them:

one of the correct output should be (1,2), (3,4,5). And the roots (the representatives of the 2 sets) are 2 and 4, respectively. The forest should look like this:

2                 4
 \               / \
  1             3   5

Here comes the problem, if I need to sort the output by its size in descending order(so larger size comes first):

If I use union by size(simply its number of descendants including the node itself), size(2) == 2, size(4) == 3, so the right set should come first. All good;

If I use union by rank(an upper bound for its height), rank(2) == 1, rank(4) == 1, left and right set are equal and this is not what I want. So union by rank fails this case, NO good!

Why is that? Does that mean union by rank has its limitations? If so, under what circumstances we can use union by rank? Is union by size more general?


Solution

  • The point of both the union-by-rank heuristic and the union-by-size heuristic is to minimize the expected running time of the Merge operation. They are not intended for any other purpose.

    Your use case apparently involves sorting the sets by size, an unusual but not unheard-of thing to want. If you use the union-by-size heuristic, you can do this without additional work. (That doesn't redound to the asymptotic complexity, as the size of each set could be trivially maintained even while doing union-by-rank.) So for this use case it sounds like union-by-size is more convenient for you. But both of them work in the context they were designed for.