I want to mirror my cms-driven website to create a html-only static version. Local viewing is enabled with the -k
option (--convert-links
).
However, wget converts question marks ?
to percent-encoded ascii %3F
in its html-files.
wget -m -nH -np -k -E --restrict-file-names=unix,nocontrol https://localhost/mysite
Example:
Input from source https://localhost/mysite:
<link href="/dist/css/main.css?fp=12345" type="text/css" rel="stylesheet">
<a href="/contact">Contact</a>
Expected output from wget:
<link href="/dist/css/main.css?fp=12345" type="text/css" rel="stylesheet">
<a href="/contact.html">Contact</a>
Actual output from wget:
<link href="/dist/css/main.css%3Ffp=12345.css" type="text/css" rel="stylesheet">
<a href="/contact.html">Contact</a>
Note that that the contact link now has a
.html
-suffix (-E
) on purpose. This is correct. The addition of a.css
-suffix to the fingerprint can be disregarded for this use case.Note that the fingerprint
?
is converted to%3F
, this breaks local viewing.
How would I mirror my website and keep fingerprints intact?
A possible solution is to search-replace with a sed shell script:
#! /bin/bash
# replaces all occurences of string %Ffp with ?fp in *.html files
find "/var/www/mysite" -type f -name "*.html" -exec sed -i -s -r 's/%3Ffp/?fp/g' {} +
Be careful with the path, though. This command iterates over all html-files in /var/www/mysite and its subfolders.