I'm supporting a C++ application written using Borland C++ Builder 5.02 (from 1997). The find() method on the Borland string class does not behave how I would expect:
#include <cstring>
#include <iostream>
int main (int argc, char *argv[])
{
string needle = "length == eighteen";
string haystack = "<" + needle + ">";
if (haystack.find(needle) != NPOS)
cout << "Found it!" << endl;
else
cout << "Not found" << endl;
return 0;
}
This program outputs Not found
. If I change the needle to something shorter it outputs Found it!
. If I exchange the angle brackets for some other characters it finds it. Spaces work, but parentheses also don't.
Note that I am using the Borland string library here: if I #include <string>
and use std::string
instead then it works exactly how I would expect. Sadly changing the whole application to use STL strings is not a feasible answer!
From the documentation it seems that Borland uses a hash-based algorithm for string search. I can't find any more details about this, and I've stepped through the disassembly but am not much the wiser.
I find it very hard to believe that this is really a bug in the string library, particularly since if it were then I would expect to be able to find an article or something about it. I can't find any such information.
However, I've run out of ideas! Is this a known bug? Is there a fix?
EDIT: Having looked again at the disassembly, I think it's trying to do something like the Rabin-Karp algorithm, where the hash function is calculated mod 33554393 (the largest prime < 2^25). It could well be the polynomial hash function with a base of 32 (i.e. a_0 + 32 a_1 + 32^2 a_2 + .. + 32^n a_n) but that's just a hunch. Sounds like a possible overflow as Daniel Fischer suggested.
I have found a reference from 1998 suggesting Borland's implementation of searching strings has a bug:
https://groups.google.com/forum/?fromgroups=#!searchin/borland.public.cpp.language/cstring$20bug/borland.public.cpp.language/XBzjaJmCYpk/gtMPm-j8jugJ
Also, it appears that at some point in history the C++ commitee decided that a string class would be part of standard C++, and cstring's string class is a remnant of this:
https://groups.google.com/forum/?fromgroups=#!searchin/borland.public.cpp.language/borland$20cstring/borland.public.cpp.language/2psY2seRmS4/ywVrqwU1C2wJ