algorithmtime-complexitystring-searchknuth-morris-pratt

KMP algorithm worst case analysis


I can't understand how KMP maintains O(m+n). I'm looking for a pattern "aaaab" in "aaaaaaaaaa...".

Then the prefix table would be

aaaab

01230

Every time mismatch happens on 'b', it's prefix length is always 0. So the pattern shifts only one step right.

aaaaaaaaaa...

aaaab

aaaaaaaaaa...

_aaaab

aaaaaaaaaa...

__aaaab

And for every trial, I need to compare full n times since mismatch happens at the very last 'b'. Thus it still requires O(m*n) comparisons.

Can anyone explain how KMP get O(m+n) for me? Thanks in advance.


Solution

  • The trick is that you don't just advance the position in the string by 1 char when you encounter a mismatch. KMP is designed to avoid doing just that. In your example the mismatch occurs after 4 consecutive matching a's. There's no b in those 4 a's, so you can advance your search position in the string by 4, not by 1. You keep doing this and end up with O(m).

    For all of this to work, KMP uses the pattern to precompute a helper table. This table will basically tell you how much to advance the position within the string for a mismatch occurring in a given position within the pattern. It turns out that the table is also built in linear time, O(n).

    See wikipedia and other places for examples, details and (pseudo)code.