delphirtfdelphi-11-alexandriatrichedit

Function to extract plain text from RTF file gives wrong result


In a 32-bit VCL Application in Windows 10 in Delphi 11 Alexandria, I need to search for text in an RTF file. So I use this function (found here) to extract the plain text from the RTF file:

function RtfToText(const RTF_FilePath: string; ReplaceLineFeedWithSpace: Boolean): string;
var
  RTFConverter: TRichEdit;
  MyStringStream: TStringStream;
begin
  RTFConverter := TRichEdit.CreateParented(HWND_MESSAGE);
  try
    MyStringStream := TStringStream.Create(RTF_FilePath);
    try
      RTFConverter.Lines.LoadFromStream(MyStringStream);
      RTFConverter.PlainText := True;
      RTFConverter.Lines.StrictDelimiter := True;
      if ReplaceLineFeedWithSpace then
        RTFConverter.Lines.Delimiter := ' '
      else
        RTFConverter.Lines.Delimiter := #13;
      Result := RTFConverter.Lines.DelimitedText;
    finally
      MyStringStream.Free;
    end;
  finally
    RTFConverter.Free;
  end;
end;

However, instead of the RTF file's plain text content, the function gives back the file path of the RTF file!

What is wrong with this function, and how can I efficiently extract the plain text from an RTF file without having to use a parented TRichEdit control?


Solution

  • The TStringStream constructor does not load a file, like you are expecting it to. TStringStream is not TFileStream. As its name suggests, TStringStream is a stream wrapper for a string. So, its constructor takes in a string and copies it as-is into the stream. Thus, you are loading the RichEdit with the value of the file path string itself, not the content of the file that the string refers to.

    You don't actually need the TStringStream at all, as the TRichEdit can load the file directly, eg:

    function RtfToText(const RTF_FilePath: string; ReplaceLineFeedWithSpace: Boolean): string;
    var
      RTFConverter: TRichEdit;
    begin
      RTFConverter := TRichEdit.CreateParented(HWND_MESSAGE);
      try
        RTFConverter.PlainText := False; 
        RTFConverter.Lines.LoadFromFile(RTF_FilePath);
        RTFConverter.PlainText := True;
        RTFConverter.Lines.StrictDelimiter := True;
        if ReplaceLineFeedWithSpace then
          RTFConverter.Lines.Delimiter := ' '
        else
          RTFConverter.Lines.Delimiter := #13;
        Result := RTFConverter.Lines.DelimitedText;
      finally
        RTFConverter.Free;
      end;
    end;
    

    That being said, there is nothing outside of TRichEdit in the native RTL or VCL that will parse RTF into plain-text for you. If you don't want to use TRichEdit, you will have to either parse the RTF yourself, or find a 3rd party parser to use.