How can my application leverage etags, and does introducing streaming/chunked encoding introduce any complications?
When doing HTTP streaming with Transfer-Encoding: chunked
, Content-Length
can't be sent because it often is not known.
To my understanding, when browsers leverage etags they require knowing Content-Length
. If an etag is provided but not Content-Length
, browsers will never send If-None-Match
.
Is there a way around this?
Etags are http headers used to version pages and allows the client to reuse previously cached copies of a page, if the page have not changed.
The basic idea is that the client goes to a page and sends an http request to the server that has the page. The server then renders the page and returns the response to the client along with an etag that holds some value. In addition to showing the page, the client will file a copy of that page in its local cache along with the etag. The next time the client visits that page, the client will issue a request to the webserver but include the etag in an If-None-Match
header. Such a request is known as an conditional GET. The client is saying, "I would like this page, however I already have a cached version of the page with this etag value, so if you think that my cached version is current, just tell me that, and I'll just show my cached copy to the user".
There aren't any semantic requirements for the etag value. It should be used to store a value that allows you to determine if the clients copy is up to date.
The simplest way to do this is to calculate a hash of your response and if the hash matches the etag value in the request headers, then the client already holds an identical copy and you can return a 304 No content
and return an empty body in the response. This is much faster than returning the entire page again.
While calculating a hash is a simple and safe way to determine if the cache is still good, more intelligent techniques exist that will allow you to reduce the load on your webserver. Consider a page that displays a product in a webshop. Instead of rendering the page with the product description and then computing and comparing the hash, you could just use the product's updated_at
attribute. This means that the first thing you do in your application is check the etag and fetch the product from the database to compare the updated_at
attribute. If that matches, you assume the product's details have not been changed and you can finish the request processing without doing anything further and then return the 304 No content
response.
However, you should be careful with this kind of optimization, as there may be additional content on the page that can be become outdated without affecting the updated_at
attribute of the product in your database. This could be a sidebar with the latest news or worse, a personalized part of the page such as a shopping cart listing previously added products.
Chunked encoding is merely a technique to transfer a response in multiple chunks, so the receiving client can start rendering the page faster while the server is still working on the remaining chunks. It does not have anything to do with caching. However, if you want to use the hashed value of the response as the etag, that is obviously not possible as the headers are sent before you know the full response, which are required to calculate the hash.