I have a problem in understanding the node structure for the calculation of the complete longest increasing subsequence (lis) in the paper "Heaviest Increasing/Common Subsequence Problems" by Jacobson and Vo.
Here is the pseudo code from the paper:
What is meant with
node is an auxiliary array that, for each element in L, contains a record of an element that precedes this element in an increasing subsequence. The function newnode() constructs such records and links them into a directed graph. At the end of the algorithm, we can search from the maximal element of L to recover an LIS of sigma.
? How would you implement this structure?
Do I have to construct a directed graph with all the elements of the sequence as vertices (plus a nil-vertex) and edges "\sigma_i -> s" and then search for the longest path starting at the maximal element of L (and ending at nil)? Isn't there a more effective way to get the complete lis?
My second question: Is this algorithm as fast as the algorithm described in wikipedia? If not: Can I modify the algorithm from wikipedia as well to calculate heaviest common subsequences as described in the paper?
I'd implement the array with an array and the graph as a singly-linked list with structure sharing. To wit, in generic C++,
#include <algorithm>
#include <iostream>
#include <memory>
#include <utility>
#include <vector>
template <typename T>
struct Node {
explicit Node(const T& e) : element(e) {}
T element;
std::size_t index = 0;
Node* previous = nullptr;
};
template <typename T, typename Compare>
std::vector<T> LongestIncreasingSubsequence(const std::vector<T>& elements,
Compare compare) {
if (elements.empty()) {
return {};
}
std::vector<std::unique_ptr<Node<T>>> node_ownership;
node_ownership.reserve(elements.size());
std::vector<Node<T>*> tableau;
for (const T& element : elements) {
auto node = std::make_unique<Node<T>>(element);
auto it = std::lower_bound(tableau.begin(), tableau.end(), node.get(),
[&](const Node<T>* a, const Node<T>* b) {
return compare(a->element, b->element);
});
if (it != tableau.begin()) {
auto previous = it[-1];
node->index = previous->index + 1;
node->previous = previous;
}
if (it != tableau.end()) {
*it = node.get();
} else {
tableau.push_back(node.get());
}
node_ownership.push_back(std::move(node));
}
Node<T>* longest = *std::max_element(
tableau.begin(), tableau.end(),
[](Node<T>* a, Node<T>* b) { return a->index < b->index; });
std::vector<T> result(longest->index + 1);
for (; longest != nullptr; longest = longest->previous) {
result.at(longest->index) = longest->element;
}
return result;
}
int main() {
for (int x : LongestIncreasingSubsequence(
std::vector<int>{3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3},
std::less<int>())) {
std::cout << x << '\n';
}
}
If you're fortunate enough to be working in a language with garbage collection, you can ignore the business with node_ownership
and std::move
.
Here's a Python version.
import bisect
def longest_increasing_subsequence(elements):
elements = list(elements)
if not elements:
return []
# Build the tableau
tableau_elements = []
tableau_indexes = []
predecessors = []
for i, element in enumerate(elements):
j = bisect.bisect_left(tableau_elements, element)
predecessors.append(tableau_indexes[j - 1] if j > 0 else None)
if j < len(tableau_elements):
tableau_elements[j] = element
tableau_indexes[j] = i
else:
tableau_elements.append(element)
tableau_indexes.append(i)
# Find the subsequence lengths
lengths = []
for i, predecessor in enumerate(predecessors):
lengths.append(1 if predecessor is None else lengths[predecessor] + 1)
# Extract the best subsequence
best = max(range(len(lengths)), key=lambda i: lengths[i])
subsequence = []
while best is not None:
subsequence.append(elements[best])
best = predecessors[best]
subsequence.reverse()
return subsequence
print(longest_increasing_subsequence([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3]))