pythonhtmlselenium-webdriverphantomjswebdriverwait

Parse a dynamic HTML Page using PhantomJS and Python


I would like to scrape an HTML page where content is not static but loaded with javascript.

I downgrade Selenium to version 3.3.0 in order to be able to support PhantomJS (v4.9.x does not support PhantomJS anymore) and wrote this code:

from selenium import webdriver
driver = webdriver.PhantomJS('path-to-phantomJS')
driver.get('my_url')
p_element = driver.find_element_by_id(id_='my-id')
print(p_element)

The error I'm getting is:

selenium.common.exceptions.NoSuchElementException: Message: "errorMessage":"Unable to find element with id 'my-id'"

The element I want to return is tag <section> with a certain id and all its subtags. The HTML content is like that:

<section id="my-id" class="my-class">...</section>

Solution

  • This error message...

    selenium.common.exceptions.NoSuchElementException: Message: "errorMessage":"Unable to find element with id 'my-id'
    

    ...implies that the element wasn't found within the HTML DOM.

    The possible reason is that the desired WebElement didn't render within the Viewport as by default initializes with a minimized viewport.


    Solution

    You need to initialize PhantomJS with the maximized viewport inducing WebDriverWait for the visibility_of_element_located() while locating it as follows:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    driver = webdriver.PhantomJS('path-to-phantomJS')
    driver.get('my_url')
    driver.maximize_window()
    p_element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "my-id")))
    print(p_element)