cachingdrupalamazon-cloudfrontvarnish

Cloudfront + Varnish - Revalidations


We have a set up with Varnish + Cloudfront in front of Drupal 10.

Cloudfront -> Varnish -> Nginx/PHPFPM -> Drupal 10

Drupal suggests turning off internal page cache for large sites (over 100k pages in our case) and using Varnish as the sole arbitrator of Cache freshness. Varnish integrates very well with various Cache purging mechanisms available in Drupal.

This approach is well recommended: https://dev.acquia.com/blog/drupal-cache-strategy-varnish-and-edge-cdn-acquia

However, here is the challenge.

Cloudfront needs to be configured to revalidate non-static pages as clearing cache from Cloudfront is cumbersome and expensive.

When Cloudfront sends a revalidation request, Varnish sends it back to origin which obviously doesn't have the page cached as internal page cache is turned off in favour of Varnish. This makes Varnish proxy quite useless if a CDN like Cloudfront or Cloudflare is put in front of Varnish.

We have turned on the internal page cache but that makes Varnish redundant in this set up.

We are using the VCL specified in this post: https://www.varnish-software.com/developers/tutorials/configuring-varnish-drupal/

So the question is: Is there a way to make Varnish sole arbitrator of Cache Freshness through Varnish configuration so it doesn't go back to origin for revalidation within the TTL?


Solution

  • Varnish is not the source of truth in your setup, Drupal takes care of that part. However, Varnish can be used as a source for CloudFront to revalidate content.

    Varnish cache purging through Drupal

    You referred to a Varnish Software Developer Portal tutorial that I wrote. This VCL configuration will not only cache Drupal pages and static assets, it also provides a mechanism to purge these pages when changes occur in Drupal.

    This means you can have longer TTLs for Drupal content, without risking that content because stale: when you alter content in Drupal, Drupal will connect to Varnish to purge the relevant objects from the cache.

    This means that Varnish (to some extent) can be a reliable source for revalidation.

    ETag-based revalidation

    When revalidation has to take place, because the content has expired from the cache, this can be done conditionally.

    By sending an ETag header in your response, the consuming client will store that ETag and will send its value in the next request under the form of an If-None-Match header.

    If that If-None-Match header value is identical to the ETag header value on the next response, it's basically the same unchanged content. This means that the server can return a 304 Not Modified status code and not return any data.

    While there's probably a way of doing this in Drupal, the point is that Varnish does it out of the box in both directions:

    If the client that Varnish interacts with happens to be CloudFront, CloudFront could receive 304 Not Modified responses from Varnish if revalidated content is still the same.

    Stale-While-Revalidate

    While there are regular revalidations that return a 200 OK response, there are also conditional revalidations that return 304 Not Modified, as I explained in the previous section.

    However, these revalidations are usually synchronous and require the client (browser or CloudFront) to wait until the server has responded.

    Luckily both Varnish & CloudFront support asynchronous revalidation that is based on the stale-while-revalidate directive of the Cache-Control header.

    Imagine the following Cache-Control header:

    Cache-Control: public, max-age=3600, stale-while-revalidate=7200
    

    If Drupal sends this type of header, Varnish will use the value of the max-age (or s-maxage if present) directive to set the TTL of the cached object.

    If the object expires, it must be revalidated, however if there remaining (tolerated) staleness left, the object will still be served to the requesting client, while an asynchronous fetch takes place to the origin.

    This ensures that clients who hit an expired object, aren't the victim of potential latency.

    Because the remaining TTL is a moving target, the following equation applies:

    Total Object Lifetime = TTL + grace + keep
    

    For reference: grace is Varnish's implementation of stale-while-revalidate and contains the allowed staleness. It defaults to 10 seconds.

    The keep value keeps expired and out-of-grace objects around to ensure that ETag revalidation can still happen synchronously. However the default value of Keep is zero seconds.

    If the TTL value is zero, because the object expired, the remaining grace value could still mean the total object lifetime remains greater than zero. In that case, revalidation will take place, but it will be asynchronous (and potentially conditional)

    However if the sum of the remaining TTL and grace value is below zero, synchronous revalidation is required. If the keep value is high enough to keep the object around, conditional revalidation is still possible at that point.

    You can configure staleness is various ways:

    Apparently CloudFront also supports stale-while-validate, which is great. But if you don't set stale-while-revalidate in Drupal, you should expose it in VCL. You could add the following piece of VCL code to do that:

    sub vcl_backend_response {
        set beresp.grace = 1d;
        if(beresp.http.Cache-Control !~ "stale-while-revalidate") {
            set beresp.http.Cache-Control = beresp.http.Cache-Control + ", stale-while-revalidate=86400"; 
        }
    }
    

    This will expose the grace value of Varnish in a standardized way that CloudFront will understand.

    CloudFront content revalidation

    Now that we have established that Varnish stays up to date thanks to effective cache purging in Drupal, and now that we know about conditional and asynchronous cache validation, you can start leveraging these features in CloudFront.

    Despite there being pretty long Cache-Control TTL values, you'll want to keep the TTLs in CloudFront low. As you explained, purging the cache in CloudFront is cumbersome and expensive.

    Even if you set the TTL in CloudFront to a couple of seconds:

    These 2 mechanisms will lower the impact of frequent cache revalidation (due to low TTLs in CloudFront), ensuring the CloudFront CDN will distribute content globally, while still fetching for updates from Varnish regularly.

    And because Varnish is so powerful, it acts as an origin shield that protects Drupal from the impact of frequent CloudFront revalidation

    And because of stale-while-revalidate, the end-user will hardly notice the impact of the low TTLs in CloudFront.