c++stringc++builderc++builder-5

Borland string::find bug


I'm supporting a C++ application written using Borland C++ Builder 5.02 (from 1997). The find() method on the Borland string class does not behave how I would expect:

#include <cstring>
#include <iostream>

int main (int argc, char *argv[])
{
   string needle = "length == eighteen";
   string haystack = "<" + needle + ">";
   if (haystack.find(needle) != NPOS)
      cout << "Found it!" << endl;
   else
      cout << "Not found" << endl;

   return 0;
}

This program outputs Not found. If I change the needle to something shorter it outputs Found it!. If I exchange the angle brackets for some other characters it finds it. Spaces work, but parentheses also don't.

Note that I am using the Borland string library here: if I #include <string> and use std::string instead then it works exactly how I would expect. Sadly changing the whole application to use STL strings is not a feasible answer!

From the documentation it seems that Borland uses a hash-based algorithm for string search. I can't find any more details about this, and I've stepped through the disassembly but am not much the wiser.

I find it very hard to believe that this is really a bug in the string library, particularly since if it were then I would expect to be able to find an article or something about it. I can't find any such information.

However, I've run out of ideas! Is this a known bug? Is there a fix?

EDIT: Having looked again at the disassembly, I think it's trying to do something like the Rabin-Karp algorithm, where the hash function is calculated mod 33554393 (the largest prime < 2^25). It could well be the polynomial hash function with a base of 32 (i.e. a_0 + 32 a_1 + 32^2 a_2 + .. + 32^n a_n) but that's just a hunch. Sounds like a possible overflow as Daniel Fischer suggested.


Solution

  • I have found a reference from 1998 suggesting Borland's implementation of searching strings has a bug:

    https://groups.google.com/forum/?fromgroups=#!searchin/borland.public.cpp.language/cstring$20bug/borland.public.cpp.language/XBzjaJmCYpk/gtMPm-j8jugJ

    Also, it appears that at some point in history the C++ commitee decided that a string class would be part of standard C++, and cstring's string class is a remnant of this:

    https://groups.google.com/forum/?fromgroups=#!searchin/borland.public.cpp.language/borland$20cstring/borland.public.cpp.language/2psY2seRmS4/ywVrqwU1C2wJ