pythonstringlongest-substring

Longest common substring from more than two strings


I'm looking for a Python library for finding the longest common sub-string from a set of strings. There are two ways to solve this problem:

Method implemented is not important. It is important it can be used for a set of strings (not only two strings).


Solution

  • These paired functions will find the longest common string in any arbitrary array of strings:

    def long_substr(data):
        substr = ''
        if len(data) > 1 and len(data[0]) > 0:
            for i in range(len(data[0])):
                for j in range(len(data[0])-i+1):
                    if j > len(substr) and is_substr(data[0][i:i+j], data):
                        substr = data[0][i:i+j]
        return substr
    
    def is_substr(find, data):
        if len(data) < 1 and len(find) < 1:
            return False
        for i in range(len(data)):
            if find not in data[i]:
                return False
        return True
    
    
    print long_substr(['Oh, hello, my friend.',
                       'I prefer Jelly Belly beans.',
                       'When hell freezes over!'])
    

    No doubt the algorithm could be improved and I've not had a lot of exposure to Python, so maybe it could be more efficient syntactically as well, but it should do the job.

    EDIT: in-lined the second is_substr function as demonstrated by J.F. Sebastian. Usage remains the same. Note: no change to algorithm.

    def long_substr(data):
        substr = ''
        if len(data) > 1 and len(data[0]) > 0:
            for i in range(len(data[0])):
                for j in range(len(data[0])-i+1):
                    if j > len(substr) and all(data[0][i:i+j] in x for x in data):
                        substr = data[0][i:i+j]
        return substr
    

    Hope this helps,

    Jason.