htmlpython-3.xweb-scrapingbeautifulsoupexpandable

Get full HTML for page with dynamic expanded containers with python


I am trying to pull the full HTML from ratemyprofessors.com however at the bottom of the page, there is a "Load More Ratings" button that allows you to see more comments.

I am using requests.get(url) and beautifulsoup, but that only gives the first 20 comments. Is there a way to have the page load all the comments before it returns?

Here is what I am currently doing that gives the top 20 comments, but not all of them.

    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    comments = []
    for j in soup.findAll('div', attrs={'class': 'Comments__StyledComments-dzzyvm-0 dEfjGB'}):
        comments.append(j.text)

Solution

  • BeautifulSoup is more of an HTML parser for static pages than renderer for more dynamic web apps.

    You could achieve what you want using a headless browser via Selenium by rendering the full page and repeatedly clicking the more link until there is no more to load.

    Example: Clicking on a link via selenium

    Since you're already using Requests, another option that might work is Requests-HTML which also supports dynamic rendering By calling .html.render() on the response object.

    Example: https://requests-html.kennethreitz.org/index.html#requests_html.HTML.render

    Reference: Clicking link using beautifulsoup in python