pythonalgorithmdynamic-programminglcs

Longest common subsequence of 3+ strings


I am trying to find the longest common subsequence of 3 or more strings. The Wikipedia article has a great description of how to do this for 2 strings, but I'm a little unsure of how to extend this to 3 or more strings.

There are plenty of libraries for finding the LCS of 2 strings, so I'd like to use one of them if possible. If I have 3 strings A, B and C, is it valid to find the LCS of A and B as X, and then find the LCS of X and C, or is this the wrong way to do it?

I've implemented it in Python as follows:

import difflib

def lcs(str1, str2):
    sm = difflib.SequenceMatcher()
    sm.set_seqs(str1, str2)
    matching_blocks = [str1[m.a:m.a+m.size] for m in sm.get_matching_blocks()]
    return "".join(matching_blocks)

print reduce(lcs, ['abacbdab', 'bdcaba', 'cbacaa'])

This outputs "ba", however it should be "baa".


Solution

  • Just generalize the recurrence relation.

    For three strings:

    dp[i, j, k] = 1 + dp[i - 1, j - 1, k - 1] if A[i] = B[j] = C[k]
                  max(dp[i - 1, j, k], dp[i, j - 1, k], dp[i, j, k - 1]) otherwise
    

    Should be easy to generalize to more strings from this.