I am trying to set up a CloudFront distribution which caches all resources except for HTML files while enforcing clean URLs through a CloudFront Function. However, it seems like my current setup is caching HTML files as well. I would like to understand why and how to fix it.
I have defined two cache behaviors in my distribution: one with an *.html
path pattern and the Managed-CachingDisabled
policy, and another default behavior (*
) with the Managed-CachingOptimized
policy.
This is what the behaviors look like in my CloudFromation template:
DefaultCacheBehavior:
TargetOriginId: s3origin
CachePolicyId: 658327ea-f89d-4fab-a63d-7e88639e58f6 # CachingOptimized
ViewerProtocolPolicy: redirect-to-https
AllowedMethods: [GET, HEAD, OPTIONS]
CachedMethods: [GET, HEAD, OPTIONS]
Compress: true
FunctionAssociations:
- EventType: viewer-request
FunctionARN: !GetAtt RedirectFunction.Outputs.FunctionArn
CacheBehaviors:
- PathPattern: '*.html'
TargetOriginId: s3origin
CachePolicyId: 4135ea2d-6df8-44a3-9df3-4b5a84be39ad # CachingDisabled
ViewerProtocolPolicy: redirect-to-https
AllowedMethods: [GET, HEAD, OPTIONS]
CachedMethods: [GET, HEAD, OPTIONS]
Compress: true
FunctionAssociations:
- EventType: viewer-request
FunctionARN: !GetAtt RedirectFunction.Outputs.FunctionArn
Both behaviors also have a CloudFront Function associated with them at the viewer-request
stage, which enforces "clean URLs" by loading an "index.html" file from any path ending in a trailing slash, and redirecting any "unclean URLs" to this path. This function looks like this:
var indexDocument = 'index.html';
function redirect(uri) {
return {
statusCode: 301,
statusDescription: 'Moved Permanently',
headers: { location: { value: uri } },
};
}
function handler(event) {
var request = event.request;
var uri = request.uri || '/';
if (uri.endsWith('/')) {
// add index document and return properly-formatted requests
request.uri = uri + indexDocument;
return request;
}
if (uri.endsWith('/' + indexDocument)) {
// trim index document
return redirect(uri.slice(0, -indexDocument.length));
}
if (!request.uri.includes('.')) {
// add trailing slash
return redirect(uri + '/');
}
return request;
}
I haven't been able to find any documentation on this, but I have to guess that CloudFront is picking a cache behavior based on the initial user request, which is a clean URL without a .html
extension, and not the origin request returned by my CloudFront function, which adds the index.html
document to the request.
If this is true, how can I specify different cache behaviors for HTML files while using clean URL redirects? If this isn't the case, why might my distribution be caching HTML files?
I realized that I was confusing CloudFront caches (caching files on edge locations) with client caches controlled by the Cache-Control
HTTP header. The only way to set the latter is by modifying the headers of the S3 objects. That is what I actually wanted.
It is also possible to exclude HTML files from the edge caches, while enforcing Clean URLs, with the following behaviors:
DefaultCacheBehavior:
TargetOriginId: s3origin
CachePolicyId: 4135ea2d-6df8-44a3-9df3-4b5a84be39ad # CachingDisabled
ViewerProtocolPolicy: redirect-to-https
CacheBehaviors:
- PathPattern: '*.html'
TargetOriginId: s3origin
CachePolicyId: 4135ea2d-6df8-44a3-9df3-4b5a84be39ad # CachingDisabled
ViewerProtocolPolicy: redirect-to-https
- PathPattern: '*.*'
TargetOriginId: s3origin
CachePolicyId: 658327ea-f89d-4fab-a63d-7e88639e58f6 # CachingOptimized
ViewerProtocolPolicy: redirect-to-https
But after realizing my mistake, I'm not sure if this is ever useful / desirable.