wgetstatic-sitestatic-site-generation

wget converts fingerprints from ? into %3F


I want to mirror my cms-driven website to create a html-only static version. Local viewing is enabled with the -k option (--convert-links).

However, wget converts question marks ? to percent-encoded ascii %3F in its html-files.

wget -m -nH -np -k -E --restrict-file-names=unix,nocontrol https://localhost/mysite

Example:

Input from source https://localhost/mysite:

<link href="/dist/css/main.css?fp=12345" type="text/css" rel="stylesheet">
<a href="/contact">Contact</a>

Expected output from wget:

<link href="/dist/css/main.css?fp=12345" type="text/css" rel="stylesheet">
<a href="/contact.html">Contact</a>

Actual output from wget:

<link href="/dist/css/main.css%3Ffp=12345.css" type="text/css" rel="stylesheet">
<a href="/contact.html">Contact</a>

Note that that the contact link now has a .html-suffix (-E) on purpose. This is correct. The addition of a .css-suffix to the fingerprint can be disregarded for this use case.

Note that the fingerprint ? is converted to %3F, this breaks local viewing.

How would I mirror my website and keep fingerprints intact?


Solution

  • A possible solution is to search-replace with a sed shell script:

    #! /bin/bash
    
    # replaces all occurences of string %Ffp with ?fp in *.html files
    find "/var/www/mysite" -type f -name "*.html" -exec sed -i -s -r 's/%3Ffp/?fp/g' {} +
    

    Be careful with the path, though. This command iterates over all html-files in /var/www/mysite and its subfolders.