pythonhtmlexcelautomationhtml-table

Export Excel Spreadsheet From Website - Python


I am trying to find a way to export a Microsoft Excel spreadsheet (.xlsx) from a website and store locally (to my desktop) or to a database. I am able to parse a URL with tabular content and display/write to file, but I need to determine a way to retrieve spreadsheet content that requires clicking a button to download the data. More importantly, I need to be able to be able to retrieve spreadsheet data embedded within multiple separate pages as displayed on a webpage. Below is a sample script that displays tabular data from a website.

import urllib3
from bs4 import BeautifulSoup

url = 'https://www.runnersworld.com/races-places/a20823734/these-are-the-worlds-fastest-marathoners-and-marathon-courses/'

http = urllib3.PoolManager()
response = http.request('GET', url)
soup = BeautifulSoup(response.data.decode('utf-8'))
print(soup)  

I have inspected the Javascript tool that is the equivalent of manually exporting data on a website through a button click, but I need to find a way to automate this via a Python script...any assistance is most appreciated.


Solution

  • Based on your comment

    @SergeyK - Here is a link to the website with the data. I need to find a way to download the CSV listed under the "Starting Up" section of this URL: browserstack.com/test-on-the-right-mobile-devices

    There are three download buttons on the site u mentioned. Yes they re the same and only one file will be downloaded, but as an example.

    import requests
    from bs4 import BeautifulSoup
    import urllib.parse
    
    
    response = requests.get('https://www.browserstack.com/test-on-the-right-mobile-devices')
    for csv_href in BeautifulSoup(response.text, 'lxml').find_all('div', class_='download-csv'):
        link = 'https://www.browserstack.com/' + csv_href.findNext('a').get('href')
        file_name = urllib.parse.unquote(link).replace(" ", "").split('/')[-1]
        data = requests.get(link)
        with open(file_name, 'wb') as file:
            print(f'{file_name} saved from {link}')
            file.write(data.content)
    

    OUTPUT

    BrowserStack-Listofdevicestoteston.csv saved from https://www.browserstack.com/downloads/BrowserStack%20-%20List%20of%20devices%20to%20test%20on.csv
    BrowserStack-Listofdevicestoteston.csv saved from https://www.browserstack.com/downloads/BrowserStack%20-%20List%20of%20devices%20to%20test%20on.csv
    BrowserStack-Listofdevicestoteston.csv saved from https://www.browserstack.com/downloads/BrowserStack%20-%20List%20of%20devices%20to%20test%20on.csv
    

    Or just Starting Up section without loop:

    soup = BeautifulSoup(response.text, 'lxml').find('div', {'data-trigger': 'startingup'})
    link = 'https://www.browserstack.com/' + soup.findNext('a').get('href')