unicodeutf-8indy10delphi-6

Any chance for Indy 10 to output Unicode with Delphi 6?


I gave a try for Indy 10 on Delphi 6.

The problem is - with old Indy I was able to output Unicode through UTF-8 as AnsiString by setting proper encoding in ResponseInfo.ContentType. Now I lost the Unicode output. Here is an example how did I output an unicode string with old Indy:

var
  MyUnicodeBodyString: WideString;

function MyUTF8Encode(const s: WideString): UTF8String;
var
  Len: Integer;
begin
  Len := WideCharToMultiByte(CP_UTF8, 0, PWideChar(s), Length(s), nil, 0, nil, nil);
  SetLength(Result, Len);
  if Len > 0 then
    WideCharToMultiByte(CP_UTF8, 0, PWideChar(s), Length(s), PAnsiChar(Result), Len, nil, nil);
end;

begin
  // ...
  AResponseInfo.ContentText := MyUTF8Encode(MyUnicodeBodyString);
end;

When I do the same with Indy 10, the output is like

Товар

(the UTF-8 string where each byte is encoded as Unicode then).

When I change the output to just

AResponseInfo.ContentText := MyUnicodeBodyString;

I see the normal output of ASCII and of symbols for "language for non-Unicode programs" (in Windows control panel). Other languages are garbled.

Indy 10 is programmed with "string" and probably assumes that "string" is WideString, but in Delphi 6 string is an alias for AnsiString.

Can I influence the output of Indy 10 HTTP Server without replacing every string in Indy 10 source code with WideString ?


Solution

  • Indy 10 is programmed with "string" and probably assumes that "string" is WideString

    That is incorrect. Indy's existence predates Delphi's switch to Unicode in Delphi 2009, so Indy has a lot of backwards compatibility for handling AnsiString in Delphi 2007 and earlier. In those versions, Indy does not use or assume WideString anywhere in its public API (well, except for in the IIdTextEncoding interface), everything is based on AnsiString instead.

    in Delphi 6 string is an alias for AnsiString.

    Yes, exactly. Which is why the preferred way to send non-ASCII content in an older ANSI version of Delphi is to use ANSI-encoded strings, eg:

    var
      MyAnsiBodyString: AnsiString;
    ...
    AResponseInfo.CharSet := 'utf-8';
    AResponseInfo.ContentText := MyAnsiBodyString;
    ...
    

    If the AnsiString is encoded in the default OS ANSI codepage (as it typically should be), then Indy will simply convert the AnsiString to Unicode using that codepage by default, and then encode that Unicode result as UTF-8 for transmission.

    Can I influence the output of Indy 10 HTTP Server without replacing every string in Indy 10 source code with WideString ?

    Yes. In pre-Unicode versions of Delphi, most of Indy's components/classes have additional properties/parameters to specify an ANSI byte encoding, allowing Indy to properly convert an AnsiString to Unicode before charset-converting the Unicode to bytes for transmission (and vice versa on reception).

    So, if you want to send an AnsiString that is already pre-encoded as UTF-8, one approach is to manually set the AResponseInfo.ContentLength property, as well as the IOHandler.DefAnsiEncoding property, eg:

    var
      MyUtf8Str: UTF8String;
    ...
    MyUtf8Str := MyUTF8Encode(MyUnicodeBodyString);
    AResponseInfo.CharSet := 'utf-8';
    AResponseInfo.ContentText := myUtf8Str;
    AResponseInfo.ContentLength := Length(myUtf8Str);
    AContext.Connection.IOHandler.DefAnsiEncoding := IndyTextEncoding_UTF8;
    ...
    

    If you don't set the ContentLength manually, TIdHTTPResponseInfo.WriteHeader() will calculate that value for you, by converting the ContentText to WideString using the RTL's default ANSI->Unicode conversion, and then encoding that WideString to UTF-8 to get the byte count. However, the initial ANSI->Unicode conversion will not know your AnsiString is encoded in UTF-8 and thus will not process it correctly.

    If you don't set the DefAnsiEncoding manually, TIdIOHandler.Write() will use the default DefAnsiEncoding setting of IndyTextEncoding_OSDefault to convert the ContentText to Unicode using the OS's default ANSI codepage, which is likely not UTF-8 and so will not convert the text to Unicode properly before then encoding the Unicode result to UTF-8 bytes.

    Another approach is to use AResponseInfo.ContentStream instead of AResponseInfo.ContentText. That way, you can simply store your UTF-8 bytes in a TMemoryStream or TStringStream and then TIdHTTPResponseInfo.WriteContent() can send those bytes as-is, eg:

    AResponseInfo.CharSet := 'utf-8';
    AResponseInfo.ContentStream := TStringStream.Create(MyUTF8Encode(MyUnicodeBodyString));
    

    Or:

    var
      MyUtf8Str: UTF8String;
    ...
    MyUtf8Str := MyUTF8Encode(MyUnicodeBodyString);
    AResponseInfo.CharSet := 'utf-8';
    AResponseInfo.ContentStream := TMemoryStream.Create;
    AResponseInfo.ContentStream.WriteBuffer(PAnsiChar(MyUtf8Str)^, Length(MyUtf8Str));
    AResponseInfo.ContentStream.Position := 0;
    

    Or:

    AResponseInfo.CharSet := 'utf-8';
    AResponseInfo.ContentStream := TMemoryStream.Create;
    WriteStringToStream(AResponseInfo.ContentStream, MyUTF8Encode(MyUnicodeBodyString), IndyTextEncoding_UTF8, IndyTextEncoding_UTF8);
    AResponseInfo.ContentStream.Position := 0;