c++amazon-s3aws-sdk-cpp

AWS s3 cpp sdk reading more bytes than range specified


Using the AWS s3 cpp sdk we are trying to read froma bucket using the code below. When we specify a small range using

Aws::S3::Model::GetObjectRequest object_request;
object_request.SetRange(std::to_string(position) + "-" + std::to_string(position + nbytes));

So something like 0 for start position and 4 for end position. We find that the read operation actually reads more bytes than we allocated into our buffer. So we have a file that is 69 bytes long. If we try to read the first 4 bytes from it the result that comes back from

auto results = this->s3Client->GetObject(object_request);

we find that the size of the actual read from the server was 69 bytes. The entire size of the file. Is there a minimum value that the sdk will attempt to read when you specify very small operations? Is this value documented somewhere?

This is the actual function below that is trying to read data from s3.

arrow::Status S3ReadableFile::Read(int64_t nbytes, int64_t* bytesRead, uint8_t* buffer) {
    Aws::S3::Model::GetObjectRequest object_request;

    object_request.SetBucket(bucketName);
    object_request.SetKey(key);
    object_request.SetRange(std::to_string(position) + "-" + std::to_string(position + nbytes));

    auto results = this->s3Client->GetObject(object_request);

    if (!results.IsSuccess()) {
        //TODO: Make bad arrow status here
        *bytesRead = 0;
        return arrow::Status::IOError("Unable to fetch object from s3 bucket.");
    } else {
        //byutes read should always be full amount
        *bytesRead = nbytes; //should almost always be nBytes
        memcpy(buffer, results.GetResult().GetBody().rdbuf(), *bytesRead);
        position += *bytesRead;
        return arrow::Status::OK();
    }
}

These are private members of the class S3ReadableFile

    std::shared_ptr<Aws::S3::S3Client> s3Client;
    std::string bucketName;
    std::string key;
    size_t position;
    bool valid;

Solution

  • The value of Range should be "bytes=0-4" See: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35