c++algorithmdynamic-programminglongest-substring

My "Longest Common substring with at most k mismatches" algorithm with O(m*n) space complexity is giving wrong answers with large string inputs


I am new to this platform so please bear with me. My program was based on the longest substring problem with no mismatches using Dynamic Programming. Here is a link to the D-P solution.

https://www.geeksforgeeks.org/longest-common-substring-dp-29/.

Here is a link to the original problem.

https://www.hackerrank.com/challenges/substring-diff/problem

I basically used the table in the d-p solution to store the number of mismatches in the two substrings rather than using it to calculate the number of matches as it was used in the original problem. for all values in the table less than or equal to I stored the maximum possible length and printed it.

Though my solution works fine with small test cases it gives wrong answers in the larger ones.

Here is my code in c++.

int substringDiff(int k, string s1, string s2) {
vector <vector<int>> v(s1.size()+1,vector<int>(s2.size()+1,0));
int len=0;
for(int i=1;i<=s1.size();i++) 
{
for(int j=1;j<=s2.size();j++)
{
    if(s1[i-1]==s2[j-1])
        v[i][j]=v[i-1][j-1];
    else
        v[i][j]=v[i-1][j-1]+1;
    if(v[i][j]<=k)
        len=max(len,min(i,j));
 }
}

return len;
}

I searched on the web and found some better solutions but I don't understand why my code doesn't work. please help.


Solution

  • Your program don't give correct answer when both of the longest common substrings are not prefixes of each input strings.

    For example, substringDiff(0, "xa", "ya") returns 0 while 1 is expected.

    This is because only number of differences from beginning of either strings are calculated.

    Example table for substringDiff(0, "xa", "ya"):

      y a
    x 1 1
    a 1 1
    

    To obtain correct answer, using this table,

    1. Set a length to check if satisfying common substrings of the length exists.
    2. Do checking using the length: it can be done by brute-force using the table (v[i][j] - v[i-length][j-length] is the number of mismatch of substrings that ends with i-th character of s1 and j-th character of s2)
    3. Using this check, do binary search for maximum length that satisfying common substring exists.