javaalgorithmlcs

Finding Longest Common Substring with starting indexes


I saw this code implementation here. It basically takes two strings, finds the longest common substring, and returns the length of it. I wanted to modify it slightly to get the starting indexes of the substrings for each words, but just can't figure out. I know it should be possible, since we are working with indexes of the string. I will write my edited version of the code below:


public class Main {
    public class Answer {
        int i, j, len;
        Answer(int i, int j, int len) {
            this.i = i;
            this.j = j;
            this.len = len;
        }
    }
    public Answer find(String s1,String s2){

        int n = s1.length();
        int m = s2.length();

        Answer ans = new Answer(0, 0, 0);
        int[] a = new int[m];
        int b[] = new int[m];

        for(int i = 0;i<n;i++){
            for(int j = 0;j<m;j++){
                if(s1.charAt(i)==s2.charAt(j)){
                   if(i==0 || j==0 )a[j] = 1;
                   else{
                       a[j] = b[j-1] + 1;
                   }
                   ans.len = Math.max(ans.len, a[j]);
                   ans.i = i;
                   ans.j = j;
                }

            }
            int[] c = a;
            a = b;
            b = c;
        }
        return ans;
    }
}

Solution

  • I am assuming if these are the two strings : s1 = "abcdxyz" s2 = "xyzabcd" then since abcd is longest common substring so you need index of this substring in both s1 and s2 which is 0,3 respectively.

    There are two solution for this :

    Solution 1 :

    Here, I have created an index array where I am storing the starting index of both the string with index 0 of index array storing for s1 and index 1 storing for s2.

    public Answer  find(String s1,String s2){
    
        int n = s1.length();
        int m = s2.length();
    
        Answer ans = new Answer(0, 0, 0);
        int[] a = new int[m];
        int b[] = new int[m];
        int indexes[] = new int[2];
        for(int i = 0;i<n;i++){
            for(int j = 0;j<m;j++){
                if(s1.charAt(i)==s2.charAt(j)){
                   if(i==0 || j==0 )a[j] = 1;
                   else{
                       a[j] = b[j-1] + 1;
                   }
                   if(a[j]>ans.len) {
                       ans.len = a[j];
                       indexes[0]=(i+1) - ans.len;
                       indexes[1]=(j+1) - ans.len;
                   }
                   ans.i = i;
                   ans.j = j;
    
                }
    
            }
            int[] c = a;
            a = b;
            b = c;
        }
        return ans;
    }
    

    Solution 2 :

    I am not sure what your Answer object i and j values are doing but we can make them as well store these values with i storing for s1 string and j storing for s2 string instead of creating different index array as in Solution 1.

    public Answer  find(String s1,String s2){
    
        int n = s1.length();
        int m = s2.length();
    
        Answer ans = new Answer(0, 0, 0);
        int[] a = new int[m];
        int b[] = new int[m];
        int indexes[] = new int[2];
        for(int i = 0;i<n;i++){
            for(int j = 0;j<m;j++){
                if(s1.charAt(i)==s2.charAt(j)){
                   if(i==0 || j==0 )a[j] = 1;
                   else{
                       a[j] = b[j-1] + 1;
                   }
                   if(a[j]>ans.len) {
                       ans.len = a[j];
                       ans.i=(i+1) - ans.len;
                       ans.j=(j+1) - ans.len;
                   }
    
                }
    
            }
            int[] c = a;
            a = b;
            b = c;
        }
        return ans;
    }
    

    Currently this doesn't calculate LCS right . Issue is you are not making array a as empty after running your second loop each time because of which if characters do not match in next run corresponding index of a stores previous value only but it should be 0.

    Update code is :

     public Answer  find(String s1,String s2){
    
                int n = s1.length();
                int m = s2.length();
    
                Answer ans = new Answer(0, 0, 0);
                int[] a;
                int b[] = new int[m];
                int indexes[] = new int[2];
                for(int i = 0;i<n;i++){
                    a = new int[m];
                    for(int j = 0;j<m;j++){
                        if(s1.charAt(i)==s2.charAt(j)){
                           if(i==0 || j==0 )a[j] = 1;
                           else{
                               a[j] = b[j-1] + 1;
                           }
                           if(a[j]>ans.len) {
                               ans.len = a[j];
                               ans.i=(i+1) - ans.len;
                               ans.j=(j+1) - ans.len;
                           }
    
                        }
    
                    }
                    b = a;
                }
                return ans;
            }