algorithmdata-structurestime-complexityunion-finditerated-logarithm

Weighted quick-union with path compression algorithm: time complexity analysis


I'm learning "Weighted quick-union with path compression" algorithm for an union/find structure. The Princeton edu site explained detailedly the algorithm. And here is the implementation in Java:

public class WQUPC {
  private int[] id;
  private int[] sz;
  public WQUPC(int N) {
    id = new int[N];
    sz = new int[N];
    for (int i = 0; i < N; i++) {
      id[i] = i;
      sz[i] = 1;
    }
  }

  int root(int i) {
    while (i != id[i]) {
      id[i] = id[id[i]];
      i = id[i];
    }
    return i;
  }

  boolean connected(int p, int q) { return root(p) == root(q); }

  void union(int p, int q) {
    int i = root(p);
    int j = root(q);
    if (sz[i] < sz[j]) {
      id[i] = j;
      sz[j] += sz[i];
    } else {
      id[j] = i;
      sz[i] += sz[j];
    }
  }
}

But just like the website mentions about its performance:

Theorem: Starting from an empty data structure, any sequence of M union and find operations on N objects takes O(N + M lg* N) time.

• Proof is very difficult.

• But the algorithm is still simple!

But I'm still curious about how comes the iterative logarithm lg*n. How is it derived? Can someone prove it or just explain it in an intuitive way?


Solution

  • To begin with, your question has a slight mistake: the complexity with just path compression is only O(m log(n)) (without the iterated log). See, for example, exercise 21-4.4 in Introduction To Algorithms. In fact, you block

        if (sz[i] < sz[j]) {
          id[i] = j;
          sz[j] += sz[i];
        } else {
          id[j] = i;
          sz[i] += sz[j];
        }
    

    does union by rank.

    With both union by rank and path compression, though, the expression you used can be proved easily (much more easily than the inverse Ackerman one). The proof is based on three points:

    1. On each leaf-root path, the rank of each node is increasing. This in fact relies on union by rank, BTW, and, with it, it is very easy to show.

    2. If the root of a tree has rank r, the tree has at least 2r nodes. This can be shown by induction.

    Based on 2., it's possible to show

    1. The maximal number of nodes with rank r is at most n / 2r.

    The rest of the proof now follows by a clever arrangement of the worst possible way the ranks could be organized, which still shows that not too many are bad. For further details, see the Wikipedia entry on this.