Problem
Downloading a complete working offline copy of a website that loads links/images dynamically
Research
There are questions (e.g. [1], [2], [3]) on Stackoverflow addressing this issue, most of which have the top answers using wget or httrack, both of which fail miserably (please do correct me if I am wrong) on pages that dyanmically load links or uses srcset
instead of src
for img
tag -or anything loaded via JS-. A rather obvious solution was Selenium, however, if you ever used Selenium in production, you quickly start seeing the issues that arise from such a decision (resource heavy, quite complex to use head-full driver, the fact that is it not built for that), that being said, there are people claiming to have been using it easily in production for years
Expected Solution
A script (preferably in python), that parses the page for links and loads them separately. I cannot seem to find any existing scripts that do that. If your solution is "so implement your own", then it is pointless to be asking the question in the first place, I am seeking an existing implementation.
Examples
Now there are head-less versions of Selenium
and alternatives such as PhantomJS
, either can be used with a small script to scrape any dynamically loaded website.
I had implemented a generic scraper here, and explained more about the topic here