delphidelphi-xe7

SearchBuf soWholeWord unexpected output


When testing StrUtils.SearchBuf with [soWholeWord,soDown] option, some unexpected results occurred.

program Project1;

Uses
  SysUtils,StrUtils;

function WordFound(aString,searchString: String): Boolean;
begin
  Result := SearchBuf(PChar(aString),Length(aString), 0, 0, searchString, 
    [soWholeWord,soDown]) <> nil;
end;

Procedure Test(aString,searchString: String);
begin
  WriteLn('"',searchString,'" in "',aString,'"',#9,' : ',
    WordFound(aString,searchString));
end;

begin
  Test('Delphi','Delphi');   // True
  Test('Delphi ','Delphi');  // True
  Test(' Delphi','Delphi');  // False
  Test(' Delphi ','Delphi'); // False
  ReadLn;
end.

Why are ' Delphi' and ' Delphi ' not considered a whole word?

What about a reverse search?

function WordFoundRev(aString,searchString: String): Boolean;
begin
  Result := SearchBuf(PChar(aString),Length(aString),Length(aString)-1,0,searchString, 
    [soWholeWord]) <> nil;
end;

Procedure TestRev(aString,searchString: String);
begin
  WriteLn('"',searchString,'" in "',aString,'"',#9,' : ',
    WordFoundRev(aString,searchString));
end;

begin
  TestRev('Delphi','Delphi');   // False
  TestRev('Delphi ','Delphi');  // True
  TestRev(' Delphi','Delphi');  // False
  TestRev(' Delphi ','Delphi'); // True
  ReadLn;
end.

I'm not making any sense of this at all. Except that the function is buggy.

Same results in XE7,XE6 and XE.


Update

QC127635 StrUtils.SearchBuf fails with [soWholeWord] option


Solution

  • It looks like a bug to me. Here's the code that does the search:

    while SearchCount > 0 do
    begin
      if (soWholeWord in Options) and (Result <> @Buf[SelStart]) then
        if not FindNextWordStart(Result) then Break;
      I := 0;
      while (CharMap[(Result[I])] = (SearchString[I+1])) do
      begin
        Inc(I);
        if I >= Length(SearchString) then
        begin
          if (not (soWholeWord in Options)) or
             (SearchCount = 0) or
             ((Byte(Result[I])) in WordDelimiters) then
            Exit;
          Break;
        end;
      end;
      Inc(Result, Direction);
      Dec(SearchCount);
    end;
    

    Each time round the while loop we check if soWholeWord is in the options, and then advance to the start of the next word. But we only do that advancing if

    Result <> @Buf[SelStart]
    

    Now, Result is the current pointer into the buffer, the candidate for a match. And so this test checks whether or not we are at the start of the string being searched.

    What this test means is that we cannot advance past the non-alphanumeric text to the start of the first word, if the searched string begins with non-alphanumeric text.

    Now, you might decide to remove the test for

    Result <> @Buf[SelStart]
    

    But if you do that you'll find that you no longer match the word if it is located right at the start of the string. So you'll just fail in a different way. The right way to deal with this would be to make sure that FindNextWordStart doesn't advance if we are at the start of the string, and the text there is alphanumeric.

    My guess is that the original author wrote the code like this:

    if (soWholeWord in Options) then
      if not FindNextWordStart(Result) then Break;
    

    Then they discovered that words at the start of the string would not match and changed the code to:

    if (soWholeWord in Options) and (Result <> @Buf[SelStart]) then
      if not FindNextWordStart(Result) then Break;
    

    And nobody tested what happened if the string started with non-alphanumeric text.

    Something like this seems to get the job done:

    if (soWholeWord in Options) then
      if (Result <> @Buf[SelStart]) or not Result^.IsLetterOrDigit then
        if not FindNextWordStart(Result) then Break;