c++stringalgorithmsubstringtime-complexity

C++ string::find complexity


Why doesn't c++'s implementation of string::find() use the KMP algorithm (and doesn't run in O(N + M)) and runs in O(N * M)? Is that corrected in C++0x? If the complexity of current find is not O(N * M), what is that?

so what algorithm is implemented in gcc? is that KMP? if not, why? I've tested that and the running time shows that it runs in O(N * M)


Solution

  • Why the c++'s implemented string::substr() doesn't use the KMP algorithm (and doesn't run in O(N + M)) and runs in O(N * M)?

    I assume you mean find(), rather than substr() which doesn't need to search and should run in linear time (and only because it has to copy the result into a new string).

    The C++ standard doesn't specify implementation details, and only specifies complexity requirements in some cases. The only complexity requirements on std::string operations are that size(), max_size(), operator[], swap(), c_str() and data() are all constant time. The complexity of anything else depends on the choices made by whoever implemented the library you're using.

    The most likely reason for choosing a simple search over something like KMP is to avoid needing extra storage. Unless the string to be found is very long, and the string to search contains a lot of partial matches, the time taken to allocate and free that would likely be much more than the cost of the extra complexity.

    Is that corrected in c++0x?

    No, C++11 doesn't add any complexity requirements to std::string, and certainly doesn't add any mandatory implementation details.

    If the complexity of current substr is not O(N * M), what is that?

    That's the worst-case complexity, when the string to search contains a lot of long partial matches. If the characters have a reasonably uniform distribution, then the average complexity would be closer to O(N). So by choosing an algorithm with better worst-case complexity, you may well make more typical cases much slower.