delphiutf-8delphi-7indyindy10

Sending UTF-8 encoded response with Delphi 7 and Indy 10


I need to send an UTF-8 encoded response with a Indy 10 HTTP server which includes special characters (like ő and á). The original program was written with Indy 9 and there was no problem, but according to Remy Lebeau:

On pre-2009 versions of Delphi, Indy 10 will internally perform a conversion from AnsiString to UTF-16 to Bytes if the specified Ansi and Byte encodings are different. During that conversion, if the Byte encoding is Indy8BitEncoding, UTF-16 codeunits above U+00FF will be converted to '?' characters. In order to send an AnsiString as-is, you have to set the Ansi and Byte encodings to the same TIdTextEncoding object.

But I can't find a way to do this properly. The IOHandler of the HTTP server has no DefStringEncoding property, so I've tried the following conversations with no luck:

        AResponseInfo.ContentEncoding:='utf8';
        AResponseInfo.ContentType:='text/html';

        ss:=TStringStream.Create('ő');
        ss.WriteString(' '+AnsiString('ő'));
        ss.WriteString(' '+WideString('ő'));
        ss.WriteString(' '+AnsiToUtf8('ő'));

        AResponseInfo.ContentText:=AResponseInfo.ContentText+ReadStringFromStream(ss, -1, IndyTextEncoding_8Bit, IndyTextEncoding_8Bit)+' ';
        ss.Seek(0, 0);
        AResponseInfo.ContentText:=AResponseInfo.ContentText+ReadStringFromStream(ss, -1, IndyTextEncoding_Default, IndyTextEncoding_8Bit)+' ';
        ss.Seek(0, 0);
        AResponseInfo.ContentText:=AResponseInfo.ContentText+ReadStringFromStream(ss, -1, IndyTextEncoding_ASCII, IndyTextEncoding_8Bit)+' ';
        ss.Seek(0, 0);
        AResponseInfo.ContentText:=AResponseInfo.ContentText+ReadStringFromStream(ss, -1, IndyTextEncoding_UTF16BE, IndyTextEncoding_8Bit)+' ';
        ss.Seek(0, 0);
        AResponseInfo.ContentText:=AResponseInfo.ContentText+ReadStringFromStream(ss, -1, IndyTextEncoding_UTF16LE, IndyTextEncoding_8Bit)+' ';
        ss.Seek(0, 0);
        AResponseInfo.ContentText:=AResponseInfo.ContentText+ReadStringFromStream(ss, -1, IndyTextEncoding_UTF7, IndyTextEncoding_8Bit)+' ';
        ss.Seek(0, 0);
        AResponseInfo.ContentText:=AResponseInfo.ContentText+ReadStringFromStream(ss, -1, IndyTextEncoding_UTF8, IndyTextEncoding_8Bit)+' ';

        ss.Seek(0, 0);
        AResponseInfo.ContentText:=AResponseInfo.ContentText+ReadStringFromStream(ss, -1, IndyTextEncoding_8Bit, IndyTextEncoding_UTF8)+' ';
        ss.Seek(0, 0);
        AResponseInfo.ContentText:=AResponseInfo.ContentText+ReadStringFromStream(ss, -1, IndyTextEncoding_Default, IndyTextEncoding_UTF8)+' ';
        ss.Seek(0, 0);
        AResponseInfo.ContentText:=AResponseInfo.ContentText+ReadStringFromStream(ss, -1, IndyTextEncoding_ASCII, IndyTextEncoding_UTF8)+' ';
        ss.Seek(0, 0);
        AResponseInfo.ContentText:=AResponseInfo.ContentText+ReadStringFromStream(ss, -1, IndyTextEncoding_UTF16BE, IndyTextEncoding_UTF8)+' ';
        ss.Seek(0, 0);
        AResponseInfo.ContentText:=AResponseInfo.ContentText+ReadStringFromStream(ss, -1, IndyTextEncoding_UTF16LE, IndyTextEncoding_UTF8)+' ';
        ss.Seek(0, 0);
        AResponseInfo.ContentText:=AResponseInfo.ContentText+ReadStringFromStream(ss, -1, IndyTextEncoding_UTF7, IndyTextEncoding_UTF8)+' ';
        ss.Seek(0, 0);
        AResponseInfo.ContentText:=AResponseInfo.ContentText+ReadStringFromStream(ss, -1, IndyTextEncoding_UTF8, IndyTextEncoding_UTF8)+' ';

With this I got the following response:

? ? ?? ? ? ?? ??? ??? o o L' ? ? ? Aµ Aµ A.Â' dz? dz? dz?dz? dz? dz? dz?dz? ⃵⃵âƒ. d" d" e"  Aµ Aµ A.Â' o o L'

The closest one is o but it missing the accent. The L' seems promising too, since ő is Ĺ‘ in UTF-8 bytes but it's not exactly the same too.

How can I solve this?

Update

If I set the AResponseInfo.CharSet to UTF-8 and then set the ContentText to the desired string in ANSI (not converting to anything) it works.

But now I'm facing another problem, when my ContentText is already in UTF-8 then the Indy 10 tries to convert it to UTF-8 again. Because I can't set the DefStringEncoding because it's not available here I can't make the Indy 10 to skip the conversation. The only workaround to this is to convert the UTF-8 string back to ANSI then let the Indy convert it again to UTF-8.


Solution

  • The IOHandler of the HTTP server has no DefStringEncoding property ... I can't set the DefStringEncoding because it's not available here

    Yes, it is available. It is a property of the connection's IOHandler, not of the server's IOHandler.

    Also, you are looking for DefAnsiEncoding rather than DefStringEncoding. DefAnsiEncoding represents the AnsiString encoding in memory, whereas DefStringEncoding represents the on-the-wire encoding over the socket (which is handled by AResponseInfo.CharSet).

    Try this:

    AResponseInfo.ContentType := 'text/html';
    AResponseInfo.CharSet := 'utf-8';
    AResponseInfo.ContentText := UTF8Encode('ő');
    
    //handled by TIdHTTPResponseInfo.WriteContent()
    //AContext.Connection.IOHandler.DefStringEncoding := IndyTextEncoding_UTF8;
    AContext.Connection.IOHandler.DefAnsiEncoding := IndyTextEncoding_UTF8;
    

    That being said, a simpler solution would be to put your UTF-8 content into a TStream and then use AResponseInfo.ContentStream instead of AResponseInfo.ContentText. The TStream bytes will be transmitted as-is.

    AResponseInfo.ContentType := 'text/html';
    AResponseInfo.Charset := 'utf-8';
    AResponseInfo.ContentStream := TStringStream.Create(UTF8Encode('ő'));