c++stringalgorithmsubstringlongest-substring

Longest substring that appears at least twice in O(n.logn)


Problem:

Given a String S of N characters (N <= 200 000), find the length of the longest substring that appears at least twice (the substrings can overlap).

My solution:

Here's what i've tried:

int main()
{
    std::string s;
    std::cin >> s;
    int max = 0;
    typedef std::string::const_iterator sit;
    sit end = s.end();
    for(sit it1 = s.begin(); it1 != end; ++it1)
        for(sit it2 = it1 + 1; it2 != end; ++it2)
            max = std::max(max, std::mismatch(it1, it1 + (end - it2), it2).first - it1);
    std::cout << max;
}

Question:

However, the above solution will get TLE as it runs in O(n^3). Is there any ways to improve it so it can run in O(n.logn)?


Solution

  • Suffix tree is an overkill for this problem. In fact, binary search suffices and the implementation is much easier.

    Idea

    The idea is simple: If there exists a duplicated substring of length N (N > 1), there must also exists one of length N - 1. Therefore, if we let f(x) to denote a function that returns true if there exists a duplicated substring of length x, f(x) will be a monotonic function, which allows a binary search solution.

    Implementation

    Binary search on the length of the longest duplicated substring and apply sliding windows to check if a given length is possible. Use string hashing to check for duplicate. Complexity: N log N