First I get a TMemoryStream from an HTTP request, which contains the body of the response. Then I load it in a TStringList and save the text in a widestring (also tried with ansistring).
The problem is that I need to convert the string because the users language is spanish, so vowels with accent marks are very common and I need to store the info.
lServerResponse := TStringList.Create;
lServerResponse.LoadFromStream(lResponseMemoryStream);
lStringResponse := lServerResponse.Text;
lDecodedResponse := Utf8Decode(lStringResponse );
If the response (a part of it) is "Hólá Múndó", lStringResponse value will be "Hólá Múndó", and lDecodedResponse will be "Hólá Múndó".
But if the user adds any emoji (lStringResponse value will be "Hólá Múndó 😀" if the emoji is 😀) Utf8Decode fails and returns an empty string. Is there a way to get just the ANSI characters from a string (or MemoryStream)?, or removing whatever Utf8Decode can't convert?
Thanks for your time.
TMemoryStream
is just raw bytes. There is no reason to loading that stream into a TStringList
just to extract a (Wide|Ansi)String
from it. You can assign the bytes directly to an AnsiString
/UTF8String
using SetString()
instead, eg:
var
lStringResponse: UTF8String;
lDecodedResponse: WideString;
begin
SetString(lStringResponse, PAnsiChar(lResponseMemoryStream.Memory), lResponseMemoryStream.Size);
lDecodedResponse := UTF8Decode(lStringResponse);
end;
Just make sure the HTTP content really is encoded as UTF-8, or else this approach will not work.
That being said - UTF8Decode()
(and UTF8Encode()
) in Delphi 7 DO NOT support Unicode codepoints above U+FFFF, which means they DO NOT support Emojis at all. That was fixed in Delphi 2009.
To work around that issue in earlier versions, you can use the Win32 API MultiByteToWideChar()
function instead, eg:
uses
..., Windows;
function My_UTF8Decode(const S: UTF8String): WideString;
var
WLen: Integer;
begin
WLen := MultiByteToWideChar(CP_UTF8, 0, PAnsiChar(S), Length(S), nil, 0);
if WLen > 0 then
begin
SetLength(Result, WLen);
MultiByteToWideChar(CP_UTF8, 0, PAnsiChar(S), Length(S), PWideChar(Result), WLen);
end else
Result := '';
end;
var
lStringResponse: UTF8String;
lDecodedResponse: WideString;
begin
SetString(lStringResponse, PAnsiChar(lResponseMemoryStream.Memory), lResponseMemoryStream.Size);
lDecodedResponse := My_UTF8Decode(lStringResponse);
end;
Alternatively:
uses
..., Windows;
function My_UTF8Decode(const S: PAnsiChar; const SLen: Integer): WideString;
var
WLen: Integer;
begin
WLen := MultiByteToWideChar(CP_UTF8, 0, S, SLen, nil, 0);
if WLen > 0 then
begin
SetLength(Result, WLen);
MultiByteToWideChar(CP_UTF8, 0, S, SLen, PWideChar(Result), WLen);
end else
Result := '';
end;
var
lDecodedResponse: WideString;
begin
lDecodedResponse := My_UTF8Decode(PAnsiChar(lResponseMemoryStream.Memory), lResponseMemoryStream.Size);
end;
Or, use a 3rd party Unicode conversion library, like ICU or libiconv, which handle this for you.