algorithmtime-complexitygraph-theorygraph-algorithmunion-find

Why is the time complexity of performing n union find (union by size) operations O(n log n)?


In Tree based Implementation of Union Find operation, each element is stored in a node, which contains a pointer to a set name. A node v whose set pointer points back to v is also a set name. Each set is a tree, rooted at a node with a self-referencing set pointer.

To perform a union, we simply make the root of one tree point to the root of the other. To perform a find, we follow set name pointers from the starting node until reaching a node whose set name pointer refers back to itself.

In Union by size -> When performing a union, we make the root of smaller tree point to the root of the larger. This implies O(n log n) time for performing n union find operations. Each time we follow a pointer, we are going to a subtree of size at most double the size of the previous subtree. Thus, we will follow at most O(log n) pointers for any find.

I do not understand how for each union operation, Find operation is always O(log n). Can someone please explain how the worst case complexity is actually computed?


Solution

  • Let's assume for the moment, that each tree of height h contains at least 2^h nodes. What happens, if you join two such trees?

    If they are of different height, the height of the combined tree is the same as the height of the higher one, thus the new tree still has more than 2^h nodes (same height but more nodes).

    Now if they are the same height, the resulting tree will increase its height by one, and will contain at least 2^h + 2^h = 2^(h+1) nodes. So the condition will still hold.

    The most basic trees (1 node, height 0) also fulfill the condition. It follows, that all trees that can be constructed by joining two trees together fulfill it as well.

    Now the height is just the maximal number of steps to follow during a find. If a tree has n nodes and height h (n >= 2^h) this gives immediately log2(n) >= h >= steps.