pythonbeautifulsoupmechanicalsoup

Retrieve Mechanical Soup results after submitting a form


I am struggling to retrieve some results from a simple form submission. This is what I have so far:

import mechanicalsoup

browser = mechanicalsoup.StatefulBrowser()
browser.set_verbose(2)
url = "https://www.dermcoll.edu.au/find-a-derm/"
browser.open(url)

form = browser.select_form("#find-derm-form")
browser["postcode"] = 3000
browser.submit_selected()

form.print_summary()

Where do these results end up...?

Many thanks


Solution

  • As per the MechanicalSoup FAQ, you shouldn't use this library when dealing with a dynamic JavaScript-enabled form, which seems to be the case for the website in your example.

    Instead, you can use Selenium in combination with BeautifulSoup (and a little bit of help from webdriver-manager) to achieve your desired result. A short example would look like this:

    from selenium import webdriver
    from bs4 import BeautifulSoup
    from webdriver_manager.chrome import ChromeDriverManager
    
    # set up the Chrome driver instance using webdriver_manager
    driver = webdriver.Chrome(ChromeDriverManager().install())
    
    # navigate to the page
    driver.get("https://www.dermcoll.edu.au/find-a-derm/")
    
    # find the postcode input and enter your desired value
    postcode_input = driver.find_element_by_name("postcode")
    postcode_input.send_keys("3000")
    
    # find the search button and perform the search
    search_button = driver.find_element_by_class_name("search-btn.location_derm_search_icon")
    search_button.click()
    
    # get all search results and load them into a BeautifulSoup object for parsing
    search_results = driver.find_element_by_id("search_result")
    search_results = search_results.get_attribute('innerHTML')
    search_results = BeautifulSoup(search_results)
    
    # get individual result cards
    search_results = search_results.find_all("div", {"class": "address_sec_contents"})
    
    # now you can parse for whatever information you need
    [x.find("h4") for x in search_results]  # names
    [x.find("p", {"class": "qualification"}) for x in search_results]  # qualifications
    [x.find("address") for x in search_results]  # addresses
    
    

    While this way may seem more involved, it's a lot more robust and can be easily repurposed for many more situations where MechanicalSoup falls short.