I have a Cloudfront distribution that is backed by an S3 bucket. I'm trying to list the contents of a subdirectory (prefix) within the bucket, but I'm getting a list of the entire bucket contents rather than just the objects with a given prefix.
It's important to note that I have to do this by making a direct https request to the Cloudfront domain directly (not by using the AWS cli or the AWS S3 APIs).
I am able to successfully download/upload objects from the Cloudfront domain using signed cookies to authenticate. And I can clearly list all the objects on the bucket (the auth works), but I'm not able to return only objects with a given prefix.
I also have a bucket policy that allows the following on the bucket itself:
s3:ListBucket
s3:ListBucketMultipartUploads
And the following on the bucket objects:
s3:GetObject
, s3:GetObjectAttributes
, s3:GetObjectVersion
, s3:GetObjectTagging
, s3:GetObjectVersionTagging
, s3:PutObject
, s3:PutObjectVersionTagging
, s3:PutObjectTagging
, s3:DeleteObject
, s3:DeleteObjectTagging
, s3:DeleteObjectVersionTagging
, s3:AbortMultipartUpload
, s3:ListMultipartUploadParts
When I try this query, the response returns the entire contents of the directory:
import requests
cookies = {...} # cookies needed to authenticate with cloudfront; this is working correctly
res = requests.get("https://example.cloudfront.net/?prefix=myprefix/", cookies=cookies)
... it gives me a big xml document.
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Name>mybucket</Name>
<Prefix/>
<Marker/>
<MaxKeys>1000</MaxKeys>
<IsTruncated>true</IsTruncated>
<Contents>
<Key>some-other-prefix-I-do-not-want/myfile.text</Key>
...
</Contents>
<Contents>
...
</ListBucketResult>
Or if I try this query, the params have no effect and I get the entire bucket contents (I think this is using the S3 api, so maybe that's why this particular query doesn't work)
params = {"list-type": "2", "delimiter": "/", "prefix": "myprefix"}
res = requests.get("https://example.cloudfront.net", cookies=cookies, params=params)
When I try this query, I get a 404.
res = requests.get("https://example.cloudfront.net/myprefix/", cookies=cookies)
Does anyone know how I can list only the objects with a given prefix when listing the files from a Cloudfront distribution?
I found the answer to this and am posting it for anyone who encounters it. I needed to add a custom Origin Request
Policy on my Cloudfront distribution. The policy is a copy of the AWS-provided CORS-S3Origin
policy, with the additional query strings added to allow those query parameters to pass through Cloudfront to the S3 API.
So, for example, under Cloudfront > Policies > Origin request > Create origin request policy
, I created a policy with all of the headers from the CORS-S3Origin
policy plus I added Query strings for list-type
, max-keys
, delimiter
, prefix
, and start-after
. Basically, I added any parameters that I wanted to pass though Cloudfront to S3 as documented on AWS's S3 API Reference here
Then, I just needed to attach that policy to my Cloudfront distribution by going to Cloudfront > Distributions > My Distribution
and Edit my distribution. Then, I select the "Behaviors" tab on my distribution, edit that behavior, and change the Origin request policy to my custom policy that I just created.
Once I made that change, I was able to filter the response as follows:
import requests
res = requests.get("https://example.cloudfront.net/?prefix=path/inside/my/bucket", cookies=cookies)
from lxml import objectify
root = objectify.fromstring(res.text.encode())
for contents in root.Contents:
print(contents.Key)