c++algorithmdisjoint-setsunion-find

Is this Union Find really O(n) as they claim?


I am solving a problem on LeetCode:

Given an unsorted array of integers nums, return the length of the longest consecutive elements sequence. You must write an algorithm that runs in O(n) time. So for nums = [100,4,200,1,3,2], the output is 4.

The Union Find solution to solve this is as below:

class Solution {
public:
    vector<int> parent, sz;
    
    int find(int i) {
        if(parent[i]==i) return i;
        return parent[i]=find(parent[i]);
    }
    
    void merge(int i, int j) {
        int p1=find(i);
        int p2=find(j);
        
        if(p1==p2) return;
        if(sz[p1]>sz[p2]) {
            sz[p1]+=sz[p2];
            parent[p2]=p1;
        } else {
            sz[p2]+=sz[p1];
            parent[p1]=p2;
        }
    }

    int longestConsecutive(vector<int>& nums) {
        sz.resize(nums.size(),1);
        parent.resize(nums.size(),0);

        iota(begin(parent),end(parent),0);

        unordered_map<int, int> m;

        for(int i=0; i<nums.size(); i++) {
            int n=nums[i];
            if(m.count(n)) continue;
            if(m.count(n-1)) merge(i,m[n-1]);
            if(m.count(n+1)) merge(i,m[n+1]);
            m[n]=i;
        }

        int res=0;
        for(int i=0; i<parent.size(); i++) {
            if(parent[i]==i && sz[i]>res) {
                res=sz[i];
            }
        }

        return res;
    }
};

This gets accepted by the OJ (Runtime: 80 ms, faster than 76.03% of C++ online submissions for Longest Consecutive Sequence), but is this really O(n), as claimed by many answers, such as this one? My understanding is that Union Find is an O(NlogN) algorithm.

Are they right? Or, am I missing something?


Solution

  • They are right. A properly implemented Union Find with path compression and union by rank has linear run time complexity as a whole, while any individual operation has an amortized constant run time complexity. The exact complexity of m operations of any type is O(m * alpha(n)) where alpha is the inverse Ackerman function. For any possible n in the physical world, the inverse Ackerman function doesn't exceed 4. Thus, we can state that individual operations are constant and algorithm as a whole linear.

    The key part for path compression in your code is here:

    return parent[i]=find(parent[i])
    

    vs the following that doesn't employ path compression:

    return find(parent[i])
    

    What this part of the code does is that it flattens the structure of the nodes in the hierarchy and links each node directly to the final root. Only in the first run of find will you traverse the whole structure. The next time you'll get a direct hit since you set the node's parent to its ultimate root. Notice that the second code snippet works perfectly fine, but it just does redundant work when you are not interested in the path itself and only in the final root.

    Union by rank is evident here:

    if(sz[p1]>sz[p2]) {...
    

    It makes sure that the node with more children becomes the root of the node with less children. Therefore, less nodes need to be reassigned a new parent, hence less work.

    Note: The above was updated and corrected based on feedback from @Matt-Timmermans and @kcsquared.