I've tried to implement the n log n solution to the longest increasing subsequence problem (where you need to find a longest subsequence where each element is larger than the previous one of a given sequence), one which will find the actual subsequence that is longest (not just its length). I've worked off of this video - https://www.youtube.com/watch?v=S9oUiVYEq7E - but sadly I don't think the algorithm shown on the video is correct - it seems to at least work for the exact input that is shown on the video, but doesn't work for others, such as [1, 8, 6, 4, 9, 8, 3, 5, 2, 7, 1, 9, 5, 7].
from bisect import bisect_left, bisect_right
from math import floor, ceil
sequence = [3, 4, -1, 5, 8, 2, 3, 12, 7, 9, 10]
indexes = [0]
helper = [-1] * len(sequence)
for i in range(1, len(sequence)):
if len(indexes) == 0 or sequence[i] > sequence[indexes[-1]]:
indexes.append(i)
helper[i] = indexes[-2]
else:
ceiltable = bisect_right([sequence[x] for x in indexes], sequence[i])
indexes[ceiltable] = i
if ceiltable > 0:
helper[i] = indexes[ceiltable - 1]
solution = [sequence[x] for x in indexes]
print(f"longest increasing subsequence is {solution}, and has a lenght of {len(solution)}")
And my question(s) are - can anyone confirm/disconfirm whether the algorithm shown in that video is actually correct and what might be wrong with my implementation of it? Also, can I ask anyone to provide a simple explanation/pseudocode/mockup of the n log n solution of this problem? I tried searching of course, But I don't think there is anything that really explains how this solution works, or specifically how an implementation of it would work - and again, just to note, I also have to return the actual subsequence, not just the length.
The video you refer to explains he algorithm correctly.
There are two issues in your implementation:
You should use bisect_left
instead of bisect_right
, as otherwise you will allow solutions that are in fact non-decreasing sequences (with potential duplicate values) instead of strictly increasing sequences. And bisect_right
may also result in an index that is equal to the length of the list, resulting in an invalid index access error. (Side note: If you really want to use bisect_right
and find non-decreasing sequences, then make the preceding if
condition >=
)
The code does not translate the gathered data correctly into a solution. You really need to use the helper
list to trace back the solution. Here is the code you could use:
solution = []
i = indexes[-1] # start with the index of the last value of the longest sequence
while i >= 0:
solution.append(sequence[i])
i = helper[i] # Look up what the optimal predecessor is of that index
solution.reverse() # Reverse the solution since we populated it in reverse order
The way you perform binary search is not efficient because you iterate the whole list with a list comprehension. That defeats the efficiency offered by binary search. You should have the values ready for binary search, so keep them in a separate list that you maintain throughout the algorithm, and don't do such a list comprehension any more at the time of the binary search.