rubyalgorithmlongest-substringlongest-prefix

All Common Subsequences in Array of Strings


I'm trying to find ALL common subsequences in an array of strings in ruby not just the longest single subsequence.

This means that if the input is

["aaaF hello", "aaaG hello", "aaaH hello"]

The expected output is

["aaa", " hello"]

I've been messing with the longest single subsequence algorithm, but can't figure out how to get the appropriate output. The problem with most approaches is they have other elements in the final array like "a", "aa", "h", "he", "hel", "hell"


Solution

  • Well, these smaller subsequences like "a" and "aa" are common subsequences, so that wouldn't be incorrect.

    If you really only want the longest common subsequences (i.e. those subsequences not contained in any other common subsequence), what you need to do is to check whether these subsequences are part of the larger common subsequence, and if so, discard them.

    This can be done by (in pseudocode)

    finalsubsequences = copy(subsequences);
    for(subseq in subsequences) {
       for(subseq2 in subsequences) {
           if(subseq2.indexOf(subseq) != false)
           // subseq contains subseq2, thus discard subseq2
           finalsubsequences.remove(subseq2);
       }
    }
    

    Good luck!