We have a set up with Varnish + Cloudfront in front of Drupal 10.
Cloudfront -> Varnish -> Nginx/PHPFPM -> Drupal 10
Drupal suggests turning off internal page cache for large sites (over 100k pages in our case) and using Varnish as the sole arbitrator of Cache freshness. Varnish integrates very well with various Cache purging mechanisms available in Drupal.
This approach is well recommended: https://dev.acquia.com/blog/drupal-cache-strategy-varnish-and-edge-cdn-acquia
However, here is the challenge.
Cloudfront needs to be configured to revalidate non-static pages as clearing cache from Cloudfront is cumbersome and expensive.
When Cloudfront sends a revalidation request, Varnish sends it back to origin which obviously doesn't have the page cached as internal page cache is turned off in favour of Varnish. This makes Varnish proxy quite useless if a CDN like Cloudfront or Cloudflare is put in front of Varnish.
We have turned on the internal page cache but that makes Varnish redundant in this set up.
We are using the VCL specified in this post: https://www.varnish-software.com/developers/tutorials/configuring-varnish-drupal/
So the question is: Is there a way to make Varnish sole arbitrator of Cache Freshness through Varnish configuration so it doesn't go back to origin for revalidation within the TTL?
Varnish is not the source of truth in your setup, Drupal takes care of that part. However, Varnish can be used as a source for CloudFront to revalidate content.
You referred to a Varnish Software Developer Portal tutorial that I wrote. This VCL configuration will not only cache Drupal pages and static assets, it also provides a mechanism to purge these pages when changes occur in Drupal.
This means you can have longer TTLs for Drupal content, without risking that content because stale: when you alter content in Drupal, Drupal will connect to Varnish to purge the relevant objects from the cache.
This means that Varnish (to some extent) can be a reliable source for revalidation.
When revalidation has to take place, because the content has expired from the cache, this can be done conditionally.
By sending an ETag
header in your response, the consuming client will store that ETag and will send its value in the next request under the form of an If-None-Match
header.
If that If-None-Match
header value is identical to the ETag
header value on the next response, it's basically the same unchanged content. This means that the server can return a 304 Not Modified
status code and not return any data.
While there's probably a way of doing this in Drupal, the point is that Varnish does it out of the box in both directions:
Varnish will send If-None-Match
headers to the Drupal application, if that application sent an ETag
in a previous response. Varnish can interpret a 304 Not Modified
response being returned by Drupal and re-arm the TTL accordingly.
Varnish will interpret If-None-Match
headers that are received by the client and return a 304 Not Modified
response if it matches the ETag
header that is stored in the cached object.
If the client that Varnish interacts with happens to be CloudFront, CloudFront could receive
304 Not Modified
responses from Varnish if revalidated content is still the same.
While there are regular revalidations that return a 200 OK
response, there are also conditional revalidations that return 304 Not Modified
, as I explained in the previous section.
However, these revalidations are usually synchronous and require the client (browser or CloudFront) to wait until the server has responded.
Luckily both Varnish & CloudFront support asynchronous revalidation that is based on the stale-while-revalidate
directive of the Cache-Control
header.
Imagine the following Cache-Control
header:
Cache-Control: public, max-age=3600, stale-while-revalidate=7200
If Drupal sends this type of header, Varnish will use the value of the max-age
(or s-maxage
if present) directive to set the TTL of the cached object.
If the object expires, it must be revalidated, however if there remaining (tolerated) staleness left, the object will still be served to the requesting client, while an asynchronous fetch takes place to the origin.
This ensures that clients who hit an expired object, aren't the victim of potential latency.
Because the remaining TTL is a moving target, the following equation applies:
Total Object Lifetime = TTL + grace + keep
For reference: grace
is Varnish's implementation of stale-while-revalidate
and contains the allowed staleness. It defaults to 10 seconds.
The keep
value keeps expired and out-of-grace objects around to ensure that ETag
revalidation can still happen synchronously. However the default value of Keep is zero seconds.
If the TTL
value is zero, because the object expired, the remaining grace value could still mean the total object lifetime remains greater than zero. In that case, revalidation will take place, but it will be asynchronous (and potentially conditional)
However if the sum of the remaining TTL
and grace
value is below zero, synchronous revalidation is required. If the keep
value is high enough to keep the object around, conditional revalidation is still possible at that point.
You can configure staleness is various ways:
stale-while-revalidate
directive to your Cache-Control
header in Drupal, whose value will override the standard grace
valuegrace
and keep
in VCL through set beresp.grace=xyz;
and set beresp.keep=xyz;
default_grace
and default_keep
runtime parameters of VarnishApparently CloudFront also supports stale-while-validate
, which is great. But if you don't set stale-while-revalidate
in Drupal, you should expose it in VCL. You could add the following piece of VCL code to do that:
sub vcl_backend_response {
set beresp.grace = 1d;
if(beresp.http.Cache-Control !~ "stale-while-revalidate") {
set beresp.http.Cache-Control = beresp.http.Cache-Control + ", stale-while-revalidate=86400";
}
}
This will expose the grace value of Varnish in a standardized way that CloudFront will understand.
Now that we have established that Varnish stays up to date thanks to effective cache purging in Drupal, and now that we know about conditional and asynchronous cache validation, you can start leveraging these features in CloudFront.
Despite there being pretty long Cache-Control
TTL values, you'll want to keep the TTLs in CloudFront low. As you explained, purging the cache in CloudFront is cumbersome and expensive.
Even if you set the TTL in CloudFront to a couple of seconds:
These 2 mechanisms will lower the impact of frequent cache revalidation (due to low TTLs in CloudFront), ensuring the CloudFront CDN will distribute content globally, while still fetching for updates from Varnish regularly.
And because Varnish is so powerful, it acts as an origin shield that protects Drupal from the impact of frequent CloudFront revalidation
And because of stale-while-revalidate
, the end-user will hardly notice the impact of the low TTLs in CloudFront.