pythonhtmlselenium-webdriverbeautifulsoupdynamically-generated

Loading data from a dynamically generated html url


I wanted to know is Selenium the only library that would be able to access data from a table in a webpage specifically here.

When I try to parse these sites using bs4 it doesn't have any data in the tables just the headers, it works locally using selenium, but the issue is I don't have chrome or any browser for that matter on the box I'm working on. Wondering if there was another way.


Solution

  • The page you linked to loads another resource using AJAX (you can see this in the Network tab of the Inspector feature of your browser):

    https://httpd.sslmate.com/ocspwatch/problems

    It's plain JSON, you don't even have to scrape it:

    import requests
    
    certificates = requests.get("https://httpd.sslmate.com/ocspwatch/problems").json()
    for cert in certificates:
        print(cert["problem_time"], ":", cert["problem"], "(", cert["operator_name"], ")")
    

    Output:

    2023-03-10T00:11:32+00:00 : error parsing OCSP response: ocsp: error from server: unauthorized ( GoDaddy )
    2023-03-10T00:14:58+00:00 : error parsing OCSP response: OCSP response contains bad number of responses ( eMudhra Technologies Limited )
    2023-03-10T00:14:57+00:00 : OCSP responder does not know this certificate ( Netlock )
    ...