pythontime-complexitybig-olcs

Longest Common Sequence O(mn) of two strings without string concatenation


I was under the impression that the following function has a time complexity of O(mn) with m and n being the length of the two strings. However, someone disagrees because he claims that string concatenation involves the copy of characters and hence for long string sequences this won't be O(mn) anymore. That sounds reasonable. What would be a Python bottom-up implementation that won't concatenate strings? Below is an implementation that involves string concatenation.

def lcs_dynamic_programming(s1, s2):
matrix = [["" for x in range(len(s2))] for x in range(len(s1))]
print(matrix)
for i in range(len(s1)):
    for j in range(len(s2)):
        if s1[i] == s2[j]:
            if i == 0 or j == 0:
                matrix[i][j] = s1[i]
            else:
                matrix[i][j] = matrix[i-1][j-1] + s1[i]
        else:
            matrix[i][j] = max(matrix[i-1][j], matrix[i][j-1], key=len)

cs = matrix[-1][-1]

return len(cs), cs

Edit: For example, consider s1 = "thisisatest", s2 = "testing123testing" with an lcs of "tsitest"


Solution

  • String concatenation costs O(n) because strings are immutable in Python and have to be copied to a new string for every concatenation. To avoid string concatenation, you can use a list of characters in place of a string in the matrix construction, and join the final list of characters into a string only upon return:

    def lcs_dynamic_programming(s1, s2):
        matrix = [[[] for x in range(len(s2))] for x in range(len(s1))]
        for i in range(len(s1)):
            for j in range(len(s2)):
                if s1[i] == s2[j]:
                    if i == 0 or j == 0:
                        matrix[i][j] = [s1[i]]
                    else:
                        matrix[i][j] = matrix[i - 1][j - 1] + [s1[i]]
                else:
                    matrix[i][j] = max(matrix[i - 1][j], matrix[i][j - 1], key=len)
        cs = matrix[-1][-1]
        return len(cs), ''.join(cs)