delphiamazon-s3unicode-stringutf-16le

How can I get a regular Delphi string from a stream after retrieving an object from Amazon S3?


I am putting a JSON string into Amazon S3 using the TAmazonStorageService class UploadObject method. When I retrieve the object it is placed in a stream (I am using a TStringStream), which appears to be coded in UTF-16 LE. If I then attempt to load that JSON into a memo, a TStringList, or any other similar object I get just the first character, the open curly brace of the JSON. On the other hand, if I write it to a file I get the entire JSON (UTF-16 LE encoded). I am assuming that because UTF-16 LE encodes each character with two bytes, and the second byte is always 0, Delphi is assuming that the 0 is the end of file marker.

How can I get a regular Delphi string (WideString), or even an ANSIString from the TStringStream, or is there another stream that I should use that I can use to get a WideString or ANSIString.

Here is pseudo code that represents the upload:

procedure StorePayload( AmazonConnectionInfo: TAmazonConnectionInfo; JSONString: String;
                        PayloadMemTable: TFDAdaptedDataSet;
                        PayloadType: String; PayloadVersion: Integer);
var
  AmazonStorageService: TAmazonStorageService;
  ab: TBytes;
  ResponseInfo: TCloudResponseInfo;
  ss: TStringStream;
  Guid: TGuid;
begin
  Guid := TGuid.NewGuid;
  AmazonStorageService := TAmazonStorageService.Create( AmazonConnectionInfo );
  try
  // Write payload to S3
  ResponseInfo := TCloudResponseInfo.Create;
  try
    ss := TStringStream.Create( JSONString );
    try
      ab := StringToBytes( ss.DataString );
      if AmazonStorageService.UploadObject( BucketName, Guid.ToString, ab, false, nil, nil, amzbaPrivate, ResponseInfo ) then
        PayloadMemTable.AppendRecord( [Guid.ToString, PayloadType, PayloadVersion, now() ] );
    finally
      ss.Free;
    end;
  finally
    ResponseInfo.Free;
  end;
  finally
    AmazonStorageService.Free;
  end;
end;

And here is pseudo code that represents the retrieval of the JSON:

function RetrievePayload( AmazonConnectionInfo: TAmazonConnectionInfo ): String;
var
  AmazonStorageService: TAmazonStorageService;
  ObjectName: string;
  ResponseInfo: TCloudResponseInfo;
  ss: TStringStream;
  OptParams: TAmazonGetObjectOptionals;
begin
  // I tried with and without the TAmazonGetObjectOptionals
  OptParams := TAmazonGetObjectOptionals.Create;
  OptParams.ResponseContentEncoding := 'ANSI';
  OptParams.ResponseContentType := 'text/plain';
  AmazonStorageService := TAmazonStorageService.Create( AmazonConnectionInfo );
  try
    ss := TStringStream.Create( );
    try
      ResponseInfo := TCloudResponseInfo.Create;
      try
        if not AmazonStorageService.GetObject( BucketName, PayloadID, OptParams, 
                                               ss, ResponseInfo, amzrNotSpecified ) then
          raise Exception.Create('Error retrieving item ' + ObjectName);
      Result := ss.DataString;
      // The memo will contain only {
      Form1.Memo1.Lines.Text := ss.DataString;
      finally
        ResponseInfo.Free;
      end;
    finally
      ss.Free;
    end;
  finally
    AmazonStorageService.Free;
  end;
end;

Solution

  • In Delphi 2009 and later, String is a UTF-16 UnicodeString, however TStringStream operates on 8-bit ANSI by default (for backwards compatibility with pre-Unicode Delphi versions).

    There is no need for StorePayload() to use TStringStream at all. You are storing a String into the stream just to read a String back out from it. So just use the original String as-is.

    Using StringToBytes() is unnecessary, too. You can, and should, use TEncoding.UTF8 instead, as UTF-8 is the preferred encoding for JSON data, eg:

    procedure StorePayload( AmazonConnectionInfo: TAmazonConnectionInfo; JSONString: String;
                            PayloadMemTable: TFDAdaptedDataSet;
                            PayloadType: String; PayloadVersion: Integer);
    var
      AmazonStorageService: TAmazonStorageService;
      ab: TBytes;
      ResponseInfo: TCloudResponseInfo;
      Guid: TGuid;
    begin
      Guid := TGuid.NewGuid;
      AmazonStorageService := TAmazonStorageService.Create( AmazonConnectionInfo );
      try
        // Write payload to S3
        ResponseInfo := TCloudResponseInfo.Create;
        try
          ab := TEncoding.UTF8.GetBytes( JSONString );
          if AmazonStorageService.UploadObject( BucketName, Guid.ToString, ab, false, nil, nil, amzbaPrivate, ResponseInfo ) then
            PayloadMemTable.AppendRecord( [Guid.ToString, PayloadType, PayloadVersion, Now() ] );
        finally
          ResponseInfo.Free;
        end;
      finally
        AmazonStorageService.Free;
      end;
    end;
    

    Conversely, when RetrievePayload() calls GetObject() later, you can use TEncoding.UTF8 with TStringStream to decode the String, eg:

    function RetrievePayload( AmazonConnectionInfo: TAmazonConnectionInfo ): String;
    var
      AmazonStorageService: TAmazonStorageService;
      ResponseInfo: TCloudResponseInfo;
      ss: TStringStream;
    begin
      AmazonStorageService := TAmazonStorageService.Create( AmazonConnectionInfo );
      try
        ss := TStringStream.Create( '', TEncoding.UTF8 );
        try
          ResponseInfo := TCloudResponseInfo.Create;
          try
            if not AmazonStorageService.GetObject( BucketName, PayloadID, ss, ResponseInfo, amzrNotSpecified ) then
              raise Exception.Create('Error retrieving item ' + ObjectName);
            Result := ss.DataString;
            Form1.Memo1.Text := Result;
          finally
            ResponseInfo.Free;
          end;
        finally
          ss.Free;
        end;
      finally
        AmazonStorageService.Free;
      end;
    end;
    

    If you need to retrieve any pre-existing bucket objects that have already been uploaded as UTF-16, RetrievePayload() could use TEncoding.Unicode instead:

    ss := TStringStream.Create( '', TEncoding.Unicode );
    

    However, that won't work for newer objects uploaded with UTF-8. So, a more flexible solution would be to retrieve the raw bytes using a TMemoryStream or TBytesStream, then analyze the bytes to determine whether UTF8 or UTF-16 were used, and then use TEncoding.UTF8.GetString() or TEncoding.Unicode.GetString() to decode the bytes to a String.