I am trying to find a way to export a Microsoft Excel spreadsheet (.xlsx) from a website and store locally (to my desktop) or to a database. I am able to parse a URL with tabular content and display/write to file, but I need to determine a way to retrieve spreadsheet content that requires clicking a button to download the data. More importantly, I need to be able to be able to retrieve spreadsheet data embedded within multiple separate pages as displayed on a webpage. Below is a sample script that displays tabular data from a website.
import urllib3
from bs4 import BeautifulSoup
url = 'https://www.runnersworld.com/races-places/a20823734/these-are-the-worlds-fastest-marathoners-and-marathon-courses/'
http = urllib3.PoolManager()
response = http.request('GET', url)
soup = BeautifulSoup(response.data.decode('utf-8'))
print(soup)
I have inspected the Javascript tool that is the equivalent of manually exporting data on a website through a button click, but I need to find a way to automate this via a Python script...any assistance is most appreciated.
Based on your comment
@SergeyK - Here is a link to the website with the data. I need to find a way to download the CSV listed under the "Starting Up" section of this URL: browserstack.com/test-on-the-right-mobile-devices
There are three download buttons on the site u mentioned. Yes they re the same and only one file will be downloaded, but as an example.
import requests
from bs4 import BeautifulSoup
import urllib.parse
response = requests.get('https://www.browserstack.com/test-on-the-right-mobile-devices')
for csv_href in BeautifulSoup(response.text, 'lxml').find_all('div', class_='download-csv'):
link = 'https://www.browserstack.com/' + csv_href.findNext('a').get('href')
file_name = urllib.parse.unquote(link).replace(" ", "").split('/')[-1]
data = requests.get(link)
with open(file_name, 'wb') as file:
print(f'{file_name} saved from {link}')
file.write(data.content)
OUTPUT
BrowserStack-Listofdevicestoteston.csv saved from https://www.browserstack.com/downloads/BrowserStack%20-%20List%20of%20devices%20to%20test%20on.csv
BrowserStack-Listofdevicestoteston.csv saved from https://www.browserstack.com/downloads/BrowserStack%20-%20List%20of%20devices%20to%20test%20on.csv
BrowserStack-Listofdevicestoteston.csv saved from https://www.browserstack.com/downloads/BrowserStack%20-%20List%20of%20devices%20to%20test%20on.csv
Or just Starting Up section without loop:
soup = BeautifulSoup(response.text, 'lxml').find('div', {'data-trigger': 'startingup'})
link = 'https://www.browserstack.com/' + soup.findNext('a').get('href')