javaalgorithmiterationthresholdedit-distance

Editing distance with limit (threshold)


I work for a project in Java 8 and I want to compute the editing distance for 2 strings - in a iterative way (so without recursion); the method will be executed many times, so I need to improve it with a given limit (threshold), meaning that if the editing distance from 2 strings is greater than a given number x, the algorithm will be stopped.

Now, where from should I find a method like this? Is there a java library (or provided from maven) which contain such a method? e.g. in StringUtils?

For current implementation I use classic editing distance algorithm.

    static int min(int x, int y, int z)
{
    if (x <= y && x <= z)
        return x;
    if (y <= x && y <= z)
        return y;
    else
        return z;
}

static int editDistDP(String str1, String str2, int m,
                      int n)
{
    // Create a table to store results of subproblems
    int dp[][] = new int[m + 1][n + 1];

    // Fill d[][] in bottom up manner
    for (int i = 0; i <= m; i++) {
        for (int j = 0; j <= n; j++) {
            // If first string is empty, only option is
            // to insert all characters of second string
            if (i == 0)
                dp[i][j] = j; // Min. operations = j

            // If second string is empty, only option is
            // to remove all characters of second string
            else if (j == 0)
                dp[i][j] = i; // Min. operations = i

            // If last characters are same, ignore last
            // char and recur for remaining string
            else if (str1.charAt(i - 1)
                     == str2.charAt(j - 1))
                dp[i][j] = dp[i - 1][j - 1];

            // If the last character is different,
            // consider all possibilities and find the
            // minimum
            else
                dp[i][j] = 1
                           + min(dp[i][j - 1], // Insert
                                 dp[i - 1][j], // Remove
                                 dp[i - 1]
                                   [j - 1]); // Replace
        }
    }

    return dp[m][n];
}

Also, I find this which use Levenstein distance with a given limit, but I wouldn't want to change to Levenstein if it ins't necessary


Solution

  • I think this by Apache Commons is exactly what you are looking for which has a constructor with threshold for maximum distance.

    Pseudocode:

    LevenshteinDistance ld = new LevenshteinDistance(10);
    Integer editDistanceWithThreshold = ld.apply('String1', 'String2')