algorithmsortingbinary-searchcorrectness

Longest Increasing subsequence length in NlogN.[Understanding the Algo]


Problem Statement: Aim is to find the longest increasing subsequence(not contiguous) in nlogn time.

Algorithm: I understood the algorithm as explained here : http://www.geeksforgeeks.org/longest-monotonically-increasing-subsequence-size-n-log-n/.

What i did not understand is what is getting stored in tail in the following code.

int LongestIncreasingSubsequenceLength(std::vector<int> &v) {
if (v.size() == 0)
    return 0;

std::vector<int> tail(v.size(), 0);
int length = 1; // always points empty slot in tail

tail[0] = v[0];
for (size_t i = 1; i < v.size(); i++) {
    if (v[i] < tail[0])
        // new smallest value
        tail[0] = v[i];
    else if (v[i] > tail[length-1])
        // v[i] extends largest subsequence
        tail[length++] = v[i];
    else
        // v[i] will become end candidate of an existing subsequence or
        // Throw away larger elements in all LIS, to make room for upcoming grater elements than v[i]
        // (and also, v[i] would have already appeared in one of LIS, identify the location and replace it)
        tail[CeilIndex(tail, -1, length-1, v[i])] = v[i];
}

return length;
}

For example ,if input is {2,5,3,,11,8,10,13,6}, the code gives correct length as 6. But tail will be storing 2,3,6,8,10,13.

So I want to understand what is stored in tail?.This will help me in understanding correctness of this algo.


Solution

  • tail[i] is the minimal end value of the increasing subsequence (IS) of length i+1.

    That's why tail[0] is the 'smallest value' and why we can increase the value of LIS (length++) when the current value is bigger than end value of the current longest sequence.

    Let's assume that your example is the starting values of the input:

    input = 2, 5, 3, 7, 11, 8, 10, 13, 6, ...

    After 9 steps of our algorithm tail looks like this:
    tail = 2, 3, 6, 8, 10, 13, ...

    What does tail[2] means? It means that the best IS of length 3 ends with tail[2]. And we could build an IS of length 4 expanding it with the number that is bigger than tail[2].

    tail[0] = 2, IS length = 1: 2, 5, 3, 7, 11, 8, 10, 13, 6
    tail[1] = 3, IS length = 2: 2, 5, 3, 7, 11, 8, 10, 13, 6
    tail[2] = 6, IS length = 3: 2, 5, 3, 7, 11, 8, 10, 13, 6
    tail[3] = 8, IS length = 4: 2, 5, 3, 7, 11, 8, 10, 13, 6
    tail[4] = 10,IS length = 5: 2, 5, 3, 7, 11, 8, 10, 13, 6
    tail[5] = 13,IS length = 6: 2, 5, 3, 7, 11, 8, 10, 13, 6

    This presentation allows you to use binary search (note that defined part of tail is always sorted) to update tail and to find the result at the end of the algorithm.