pythonseleniumbeautifulsouppython-requestsdata-collection

Download a xlsx file by clicking a website button using Python


I'm writing a Python script that creates a COVID-19 dashboard for my country and state and updates it daily.

However, I am struggling to download one of the necessary files.

Basically to download the file I have to access the website (https://covid.saude.gov.br/) and click on a button (class="btn-white md button button-solid button-has-icon-only ion-activatable ion-focusable hydrated ion-activated").

I tried to download via the download link but the site creates a different link every time you click the button and it still has a blob URL before HTTP.

I am very grateful to anyone who tries to help, because the data will be used to monitor the progress of the disease here where I live.


Solution

  • You can use their API to get the file name:

    import requests
    
    headers = {
            'authority':'xx9p7hp1p7.execute-api.us-east-1.amazonaws.com',
            'x-parse-application-id':'unAFkcaNDeXajurGB7LChj8SgQYS2ptm',
              }
    
    with requests.Session() as session:
        session.headers.update(headers)
        resp = session.get('https://xx9p7hp1p7.execute-api.us-east-1.amazonaws.com/prod/PortalGeral').json()
        path = resp['results'][0]['arquivo']['url']
    

    The x-parse-application-id doesn't seem to change. If it does, you can get the correct one by querying https://xx9p7hp1p7.execute-api.us-east-1.amazonaws.com/prod/PortalGeralApi and extract it from ['planilha']['arquivo'][url].