ramazon-s3httr2

reform AWS pre-signed URL with query parameters as a header-based authorization


I have a presigned URL for a get request like:

https://<my-bucket>.s3.amazonaws.com/<path/to/file.dat>?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=<credential-string>%2F20241107%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241107T202532Z&X-Amz-Expires=900&X-Amz-SignedHeaders=host&X-Amz-Signature=<signature-string>

I want to use httr2 caching, so I want to reformulate this URL using the header-based authorization so that the cache doesn't get invalidated just because the date and credentials change. However, I can't get the header-based request working. Here's what I'm trying, based on this documentation:

aws_authorization = "AWS4-HMAC-SHA256Credential=<credential-string>%2F20241107%2Fus-east-1%2Fs3%2Faws4_request,SignedHeaders=host;x-amz-date=20241107T202532Z;x-amz-expires=900,Signature=<signature-string>"

For clarity, here's the Authorization header again with line breaks:

aws_authorization = "AWS4-HMAC-SHA256
Credential=<credential-string>%2F20241107%2Fus-east-1%2Fs3%2Faws4_request,
SignedHeaders=host;x-amz-date=20241107T202532Z;x-amz-expires=900,
Signature=<signature-string>"

I then reconstruct the request as

req = request("https://<my-bucket>.s3.amazonaws.com/<path/to/file.dat>") |>
  req_headers(Authorization = aws_authorization)

req
#> <httr2_request>
#> GET
#> https://<my-bucket>.s3.amazonaws.com/<path/to/file.dat>
#> Headers:
#> • Authorization:
#> 'AWS4-HMAC-SHA256Credential=<credential-string>/20241107/us-east-1/s3/aws4_request,SignedHeaders=host;x-amz-date=20241107T202532Z;x-amz-expires=900,Signature=<signature-string>'
#> Body: empty

But I get a HTTP 400 Bad Request error when I perform the request:

req_perform(req)
#> HTTP 400 Bad Request.

Where am I going wrong in constructing the request?

EDIT

Modified based on comments:

  1. remove x-amz-expires from SignedHeaders.
  2. Remove = from SignedHeaders, and define referenced headers.

So Now my Authorization string is

AWS4-HMAC-SHA256Credential=<credential-string>%2Fus-east-1%2Fs3%2Faws4_request,SignedHeaders=host;x-amz-date,Signature=<signature-string>

And I have added headers

X-Amz-Date = 20241107T202532Z
X-Amz-Expires = 900

Resulting in the request

#> <httr2_request>
#> GET
#> <my-bucket>.s3.amazonaws.com/<path/to/file.dat>
#> Headers:
#> • Authorization:
#> 'AWS4-HMAC-SHA256Credential=<credential-string>%2F20241107%2Fus-east-1%2Fs3%2Faws4_request,SignedHeaders=host;x-amz-date,Signature=<signature-string>'
#> • X-Amz-Date: '20241107T220527Z'
#> • X-Amz-Expires: '900'
#> Body: empty

Solution

  • The following was done using:

    httr2 1.0.6 includes a function req_auth_aws_v4() that gets close to what you want but I couldn't get it to work with AWS s3. However, the req_auth_aws_v4() code is useful if you want to create your own signing algorithm.

    AWS provides helpful debug information in its responses to unsuccessful signing requests. If you are seeing a 400 Bad Request response, I would try the following to stop httr2 throwing an error, so you can see the response content:

    req = request("https://<my-bucket>.s3.amazonaws.com/<path/to/file.dat>") |> req_headers(Authorization = aws_authorization)
        
    req <- req_verbose(
      req,
      header_req = TRUE,
      header_resp = TRUE,
      body_req = TRUE,
      body_resp = TRUE,
      info = TRUE,
      redact_headers = FALSE
    )
    
    tryCatch(
      resp <- req |> req_perform(),
      httr2_http_400 = function(cnd) {
       last_response() |> resp_raw()
      }
    

    I wrote (borrowed) the following code using Python requests that successfully returns an AWS S3 object using header authentication:

    import sys, os, base64, datetime, hashlib, hmac
    import requests
    method = 'GET'
    service = 's3'
    host = '<my-bucket>.s3.amazonaws.com'
    region = 'us-east-1'
    endpoint = 'https://<my-bucket>.s3.amazonaws.com'
    request_parameters = ''
    
    def sign(key, msg):
        return hmac.new(key, msg.encode('utf-8'), hashlib.sha256).digest()
    
    def getSignatureKey(key, dateStamp, regionName, serviceName):
        kDate = sign(('AWS4' + key).encode('utf-8'), dateStamp)
        kRegion = sign(kDate, regionName)
        kService = sign(kRegion, serviceName)
        kSigning = sign(kService, 'aws4_request')
        return kSigning
    
    access_key = "AABBCCDDEEFF11223344"
    secret_key = "SomeBigLongSecretKeyToUseForAWSDontShare"
    
    if access_key is None or secret_key is None:
        print('No access key is available.')
        sys.exit()
    
    t = datetime.datetime()
    amzdate = t.strftime('%Y%m%dT%H%M%SZ')
    datestamp = t.strftime('%Y%m%d') # Date w/o time, used in credential scope
    
    canonical_uri = '/<path/to/file.dat>'
    canonical_querystring = request_parameters
    canonical_headers = 'host:' + host + '\n' + 'x-amz-content-sha256:UNSIGNED-PAYLOAD' + '\n' + 'x-amz-date:' + amzdate + '\n'
    signed_headers = 'host;x-amz-content-sha256;x-amz-date'
    payload_hash = 'UNSIGNED-PAYLOAD'
    canonical_request = method + '\n' + canonical_uri + '\n' + canonical_querystring + '\n' + canonical_headers + '\n' + signed_headers + '\n' + payload_hash
    algorithm = 'AWS4-HMAC-SHA256'
    credential_scope = datestamp + '/' + region + '/' + service + '/' + 'aws4_request'
    
    print(canonical_request)
    
    string_to_sign = algorithm + '\n' +  amzdate + '\n' +  credential_scope + '\n' +  hashlib.sha256(canonical_request.encode('utf-8')).hexdigest()
    signing_key = getSignatureKey(secret_key, datestamp, region, service)
    signature = hmac.new(signing_key, (string_to_sign).encode('utf-8'), hashlib.sha256).hexdigest()
    
    print(signature)
    
    authorization_header = algorithm + ' ' + 'Credential=' + access_key + '/' + credential_scope + ', ' +  'SignedHeaders=' + signed_headers + ', ' + 'Signature=' + signature
    headers = {'x-amz-date':amzdate, 'x-amz-content-sha256': 'UNSIGNED-PAYLOAD', 'Authorization':authorization_header}
    request_url = endpoint + canonical_uri
    
    print(authorization_header)
    print('\nBEGIN REQUEST++++++++++++++++++++++++++++++++++++')
    print('Request URL = ' + request_url)
    
    r = requests.get(request_url, headers=headers)
    
    print('\nRESPONSE++++++++++++++++++++++++++++++++++++')
    print('Response code: %d\n' % r.status_code)
    print(r.text)
    

    The important things to note here are that it sends the following headers:

    and that the Authorization header is formatted as:

    AWS4-HMAC-SHA256 Credential=AABBCCDDEEFF11223344/20241110/us-east-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=2ad745df5f4680d42800865d775ca045651eb42819e18592b99c43c3c00b949f
    

    Note the space ' ' separator between the algorithm and Credential:

    AWS4-HMAC-SHA256 Credential
    

    The httr2 req_auth_aws_v4() function appears to generate an Authorization header that has a ',' between the algorithm and Credential that may be the problem:

    AWS4-HMAC-SHA256,Credential
    

    If you have a valid signature, the following headers seem to work using httr2:

    library(httr2)
    
    aws_date = "20241110"
    aws_time = "170747"
    sig = "64aa96f4dceaa3b782245ba9a070621190e1feab93c349bc6a5a83a8ce8ee4e3"
    
    # Tested with AWS S3
    #
    # The Authorization header for AWSV4 appears to require a space " " after the Algorithm AWS4-HMAC-SHA256
    #
    
    aws_authorization = paste("AWS4-HMAC-SHA256 Credential=AABBCCDDEEFF11223344/", aws_date, "/us-east-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=", sig, sep = "")
    
    req <- request("https://<my-bucket>.s3.amazonaws.com")
    req <- req |> req_url_path(path = "<path/to/file.dat")
    req <- req |> req_headers("Authorization" = aws_authorization, "host" = "<my-bucket>.s3.amazonaws.com", "x-amz-content-sha256" = "UNSIGNED-PAYLOAD", "x-amz-date" = paste(aws_date,"T",aws_time,"Z", sep = ""))
    
    req <- req_verbose(
      req,
      header_req = TRUE,
      header_resp = TRUE,
      body_req = TRUE,
      body_resp = TRUE,
      info = TRUE,
      redact_headers = FALSE
    )
    
    req
    
    tryCatch(
      resp <- req |> req_perform(),
      httr2_http_400 = function(cnd) {
       last_response() |> resp_raw()
      }
    )
    
    resp |> resp_body_raw()
    

    Hopefully, with some changes to the httr2 req_auth_aws_v4() function this can be simplified to the following in the future.

    library(httr2)
    
    req <- request("https:/<my-bucket>.s3.amazonaws.com/<path/to/file.dat>")
    
    req <- req_auth_aws_v4(
      req,
      aws_access_key_id = "AABBCCDDEEFF11223344",
      aws_secret_access_key = "SomeBigLongSecretKeyToUseForAWSDontShare",
      aws_service = "s3",
      aws_region = "us-east-1"
    )
    
    resp <- req_perform(req)
    
    resp