stringalgorithmpattern-matching

Understanding Knuth-Morris-Pratt Algorithm


Can someone explain this to me? I've been reading about it and it still is hard to follow.

text : ababdbaababa
pattern: ababa

table for ababa is -1 0 0 1 2.

I think I understand how the table is constructed but, I dont understand how to shift once mismatch has occurred. Seems like we dont even use the table when shifting?

when do we use the table?


Solution

  • The table is used when your mismatch occurs. Let's apply the pattern to your text:

    You start matching text with pattern and test if your pattern could be in text, starting at the first position. You compare text[1] with pattern[1] and that turns out to be a match. You do the same for text[2], text[3] and text[4].

    When you want to match text[5] with pattern[5] you don't have a match (d<>a). You then know that your pattern will not start at the first position. You could then start the matching all over again for position 2 but that is not efficient. You can use the table now.

    The error occured at pattern[5] so you go to table[5] which is 2. That tells you that you can start matching at the current position again with 2 already matched characters. Instead of having to start matching position 2, you can start at your previous position (1) + table[5] (2)=3. Indeed, If we look at text[3] and text[4], we see that it is equal to pattern[1] and pattern[2], respectivily.

    The numbers in table tell you how many positions are already matched when an error occurs. In this case 2 characters of the next pattern were already matched. You can then immediately start matching for position 3 and skip position 2 (as the pattern can not be found starting at position[2]).