I am downloading an archive with wget, how can I use wget to only redownload that file when the file is newer on the server or the size has changed?
I'm aware of the -N
flag but it doesn't work.
TL;DR There is a critical bug introduced in or around wget 1.17 that broke this feature.
In older wget, you need to do wget -N https://example.com/file.zip
In newer wget, you need to do wget -N --no-if-modified-since https://example.com/file.zip
The server must support HEAD request and provide both timestamp (Last-Modified) and size (Content-Length).
Use the -d
flag to display request headers response headers for debugging.
wget --version
wget -N -d https://example.com/file.zip
truncate --size 1 file.zip
wget -N -d https://example.com/file.zip
In older versions where it used to work, wget sends a HEAD request to obtain the last modified time and the file size, then if either changed, wget sends a GET request (without Last-Modified-Since) to download the file.
In newer versions wget sends a single GET request with a Last-Modified-Since
, to only download the file if it has changed since the last date. Unfortunately that doesn't work in practice.
The change in behavior is broken by design. It cannot detect changes in file size, and as a side effect it prevents wget from recovering after a partial interrupted download.
When sending a HTTP GET
request with a timestamp, the server can respond 304 Not Modified
code with no content and no file size header. Unfortunately this leaves no chance to wget to ever know about the file size or to redownload the file.
# wget 1.21 in ubuntu 22, broken
wget -N https://example.com/file.zip -d
truncate --size 1 file.zip
wget -N https://example.com/file.zip -d
---request begin---
GET /file.zip HTTP/1.1
Host: examplpe.com
If-Modified-Since: Thu, 31 Aug 2023 18:22:20 GMT
User-Agent: Wget/1.21.2
Accept: */*
Accept-Encoding: identity
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 304 Not Modified
Date: Wed, 06 Sep 2023 09:10:16 GMT
Connection: keep-alive
Last-Modified: Thu, 31 Aug 2023 18:22:20 GMT
ETag: f37ffefc58f99f0b996a38154d87820344d86d41
Accept-Ranges: bytes
Content-Disposition: attachment; filename="file.zip"; filename*=UTF-8''file.zip
---response end---
304 Not Modified
Registered socket 3 for persistent reuse.
File ‘file.zip’ not modified on server. Omitting download.
web browsers do not suffer from this caching issue because they store the ETag
header from the initial response, a unique id representing a unique version of the file. Apache and nginx generate the ETag
automatically when serving static files based on last modification time and file size.