stringalgorithmpattern-matchingautomatonrabin-karp

How do you detect a pattern match when comparing two unique strings?


I'm looking for a solution to the following string pattern matching problem.

You've got a function that takes two arguments: pattern, and input - both are strings.

Let's say pattern: aabbaa and input: catcatdogdogcatcat

These specific arguments would be considered a match because there is a pattern in the characters of input, and that pattern matches the pattern of words in pattern

Return a boolean to indicate where or not there is a match. The examples given above would return 1.

function (pattern, input) {
  // patterns within the input string are elucidated from input
  // those patterns are checked against the pattern
  return boolean
}

Solution

  • The generalized problem "Find the patterns for a given string" is a lot harder to solve, because a string can conform to multiple patterns. For example,

    catcatdogcat
    

    Conforms to many patterns. Here's a non-exhaustive list:

    aaba           cat cat dog cat
    a              catcatdogcat
    ab             catcat dogcat
    abcabcefgabc   c a t c a t d o g c a t
    ababcab        ca t ca t dog ca t
    

    So I don't think the approach of "find all patterns, then see if the proposed pattern is among them" will work.

    That implies that we probably want to use the proposed pattern as a guideline to try to break the string down, but I'm not completely sure how that would look either.

    In the specific case when the pattern starts and ends with the same substring (such as in aaba), I suppose you could start from the beginning and ending of the string, consuming one character at a time until you get a match:

    catcatdogcat
    CatcatdogcaT
    CAtcatdogcAT
    CATcatdogCAT <-- Substring "CAT" is a candidate for "a". Check if that pattern matches.
    

    But the more general case is harder again. A similar approach can be taken, though, such as trying every string to see if it conforms to the pattern, with backtracking:

    catcatdogcat
    Catcatdogcat <-- The string isn't "CCsomethingC", so a != "C"
    CAtcatdogcat <-- the string isn't "CACAsomethingCA", so a != "CA"
    CATcatdogcat <-- the string is "CATCATsomethingCAT", so a = "CAT" is a candidate.
    

    Once you find a candidate, you can remove it from the string and from the pattern string, reducing the next step to comparing dog against the pattern b. In pseudocode,

    checkpattern(pattern, string) :=
      if (pattern has just one letter) 
        return true
      if (pattern has more than one letter, but it's one character repeated)
        return whether the string repeats that way
      for (each substring from the start of the string of length x=[0..])
        does the string match this candidate substring following the pattern?
        if (yes)
          if (checkpattern(pattern - letter, string - substring))
            return true
          else
            continue
        if (no)
          continue
      return false
    

    I think that would work. Obviously there are a lot of details to this pseudocode, and it's not very efficient, but it'll get the job done.