I am trying to extract a CSV file which is stored in a blob URL in this domain using beautiful soup: https://worldpopulationreview.com/country-rankings/exports-by-country
Here's my code:
exports = pd.read_csv(io.StringIO(requests.get(BeautifulSoup(requests.get('https://worldpopulationreview.com/country-rankings/exports-by-country').text,\
'html.parser').find_all(download="csvData.csv"))))
What I got was an exception and NO blob link in the href. The blob url does exist when I inspect the html on my browser:
I decided to just do a get request for the blob url itself instead of scraping it since the href does not show the blob url but this exception appears:
requests.exceptions.InvalidSchema: No connection adapters were found for 'blob:https://worldpopulationreview.com/850ac28e-9cd9-46b6-9423-e96a0bd7e938'
Is there a way to web scrape blob URLs?
These blob URLs are created only in the browser, usually with Javascript, they don't exist on the server at all. So you cannot download them with requests
.
You could use a Javascript script in the browser console to get the content, here is an example on how to fetch the blob URL in Javascript: https://stackoverflow.com/a/52410044/
If you need to do this automatically, you can possibly create a userscript to do it or use an automation tool like AutoHotkey to click th download link automatically.