amazon-web-servicescachingamazon-cloudfrontclean-urls

CloudFront Cache Behaviors targeting `*.html` files with "Clean URL" redirects


I am trying to set up a CloudFront distribution which caches all resources except for HTML files while enforcing clean URLs through a CloudFront Function. However, it seems like my current setup is caching HTML files as well. I would like to understand why and how to fix it.

I have defined two cache behaviors in my distribution: one with an *.html path pattern and the Managed-CachingDisabled policy, and another default behavior (*) with the Managed-CachingOptimized policy.

This is what the behaviors look like in my CloudFromation template:

DefaultCacheBehavior:
  TargetOriginId: s3origin
  CachePolicyId: 658327ea-f89d-4fab-a63d-7e88639e58f6 # CachingOptimized
  ViewerProtocolPolicy: redirect-to-https
  AllowedMethods: [GET, HEAD, OPTIONS]
  CachedMethods: [GET, HEAD, OPTIONS]
  Compress: true
  FunctionAssociations:
    - EventType: viewer-request
      FunctionARN: !GetAtt RedirectFunction.Outputs.FunctionArn
CacheBehaviors:
  - PathPattern: '*.html'
    TargetOriginId: s3origin
    CachePolicyId: 4135ea2d-6df8-44a3-9df3-4b5a84be39ad # CachingDisabled
    ViewerProtocolPolicy: redirect-to-https
    AllowedMethods: [GET, HEAD, OPTIONS]
    CachedMethods: [GET, HEAD, OPTIONS]
    Compress: true
    FunctionAssociations:
      - EventType: viewer-request
        FunctionARN: !GetAtt RedirectFunction.Outputs.FunctionArn

Both behaviors also have a CloudFront Function associated with them at the viewer-request stage, which enforces "clean URLs" by loading an "index.html" file from any path ending in a trailing slash, and redirecting any "unclean URLs" to this path. This function looks like this:

var indexDocument = 'index.html';

function redirect(uri) {
    return {
        statusCode: 301,
        statusDescription: 'Moved Permanently',
        headers: { location: { value: uri } },
    };
}

function handler(event) {
    var request = event.request;
    var uri = request.uri || '/';

    if (uri.endsWith('/')) {
        // add index document and return properly-formatted requests
        request.uri = uri + indexDocument;
        return request;
    }

    if (uri.endsWith('/' + indexDocument)) {
        // trim index document
        return redirect(uri.slice(0, -indexDocument.length));
    }

    if (!request.uri.includes('.')) {
        // add trailing slash
        return redirect(uri + '/');
    }

    return request;
}

I haven't been able to find any documentation on this, but I have to guess that CloudFront is picking a cache behavior based on the initial user request, which is a clean URL without a .html extension, and not the origin request returned by my CloudFront function, which adds the index.html document to the request.

If this is true, how can I specify different cache behaviors for HTML files while using clean URL redirects? If this isn't the case, why might my distribution be caching HTML files?


Solution

  • I realized that I was confusing CloudFront caches (caching files on edge locations) with client caches controlled by the Cache-Control HTTP header. The only way to set the latter is by modifying the headers of the S3 objects. That is what I actually wanted.

    It is also possible to exclude HTML files from the edge caches, while enforcing Clean URLs, with the following behaviors:

    DefaultCacheBehavior:
      TargetOriginId: s3origin
      CachePolicyId: 4135ea2d-6df8-44a3-9df3-4b5a84be39ad # CachingDisabled
      ViewerProtocolPolicy: redirect-to-https
    CacheBehaviors:
      - PathPattern: '*.html'
        TargetOriginId: s3origin
        CachePolicyId: 4135ea2d-6df8-44a3-9df3-4b5a84be39ad # CachingDisabled
        ViewerProtocolPolicy: redirect-to-https
      - PathPattern: '*.*'
        TargetOriginId: s3origin
        CachePolicyId: 658327ea-f89d-4fab-a63d-7e88639e58f6 # CachingOptimized
        ViewerProtocolPolicy: redirect-to-https
    

    But after realizing my mistake, I'm not sure if this is ever useful / desirable.