I'm looking for a solution to the following string pattern matching problem.
You've got a function that takes two arguments: pattern, and input - both are strings.
Let's say pattern: aabbaa
and input: catcatdogdogcatcat
These specific arguments would be considered a match because there is a pattern in the characters of input
, and that pattern matches the pattern of words in pattern
Return a boolean
to indicate where or not there is a match.
The examples given above would return 1
.
function (pattern, input) {
// patterns within the input string are elucidated from input
// those patterns are checked against the pattern
return boolean
}
The generalized problem "Find the patterns for a given string" is a lot harder to solve, because a string can conform to multiple patterns. For example,
catcatdogcat
Conforms to many patterns. Here's a non-exhaustive list:
aaba cat cat dog cat
a catcatdogcat
ab catcat dogcat
abcabcefgabc c a t c a t d o g c a t
ababcab ca t ca t dog ca t
So I don't think the approach of "find all patterns, then see if the proposed pattern is among them" will work.
That implies that we probably want to use the proposed pattern as a guideline to try to break the string down, but I'm not completely sure how that would look either.
In the specific case when the pattern starts and ends with the same substring (such as in aaba
), I suppose you could start from the beginning and ending of the string, consuming one character at a time until you get a match:
catcatdogcat
CatcatdogcaT
CAtcatdogcAT
CATcatdogCAT <-- Substring "CAT" is a candidate for "a". Check if that pattern matches.
But the more general case is harder again. A similar approach can be taken, though, such as trying every string to see if it conforms to the pattern, with backtracking:
catcatdogcat
Catcatdogcat <-- The string isn't "CCsomethingC", so a != "C"
CAtcatdogcat <-- the string isn't "CACAsomethingCA", so a != "CA"
CATcatdogcat <-- the string is "CATCATsomethingCAT", so a = "CAT" is a candidate.
Once you find a candidate, you can remove it from the string and from the pattern string, reducing the next step to comparing dog
against the pattern b
. In pseudocode,
checkpattern(pattern, string) :=
if (pattern has just one letter)
return true
if (pattern has more than one letter, but it's one character repeated)
return whether the string repeats that way
for (each substring from the start of the string of length x=[0..])
does the string match this candidate substring following the pattern?
if (yes)
if (checkpattern(pattern - letter, string - substring))
return true
else
continue
if (no)
continue
return false
I think that would work. Obviously there are a lot of details to this pseudocode, and it's not very efficient, but it'll get the job done.