I am trying to scrape a website for a mini project and the data that I needed are hidden under the #Shadow-root tag of the HTML. I tried accessing it with selenium with the code below:
def expand_shadow_element(element):
shadow_root = driver.execute_script('return arguments[0].shadowRoot', element)
return shadow_root
url = "https://new.abb.com/products/SK615502-D"
#Initializing the webdriver
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(executable_path="/Users/ritchevy/Desktop/scraping-glassdoor/chromedriver", options=options)
timeout = 10
wait = WebDriverWait(driver, timeout)
driver.set_window_size(1120, 1000)
driver.get(url)
root1 = driver.find_element(By.CSS_SELECTOR,"pis-products-details-attribute-groups")
shadow_root1 = expand_shadow_element(root1)
shadow_container_root = shadow_root1.find_element(By.CSS_SELECTOR,"div")
Upon execution its giving me this error
---> 35 shadow_container_root = shadow_root1.find_element(By.CSS_SELECTOR,"div")
36
AttributeError: 'dict' object has no attribute 'find_element'
Any idea how to resolve this?
I didn't have any issues running your original code, so not sure why it didn't work for you. Since you are not running headless, did you see the required page being opened in the browser? You might have to insert a time.sleep()
call after driver.get(url)
to ensure that you can see the browser window before you encounter the error.
I made some minor tweaks and then grabbed the data from the tables in the shadow root node (assuming that this was the data that you are after).
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
url = "https://new.abb.com/products/SK615502-D"
options = webdriver.ChromeOptions()
# * Use local Chrome.
# driver = webdriver.Chrome(options=options)
# * Use remote Chrome in Docker container.
driver = webdriver.Remote(
"http://127.0.0.1:4444/wd/hub",
DesiredCapabilities.CHROME,
options=options
)
wait = WebDriverWait(driver, 10)
driver.get(url)
# Find element enclosing the shadow root DOM.
#
root = driver.find_element(By.CSS_SELECTOR, "pis-products-details-attribute-groups")
# Extract the shadow root content.
#
shadow_root = driver.execute_script('return arguments[0].shadowRoot', root)
print(shadow_root)
for table in shadow_root.find_elements(By.CSS_SELECTOR, ".ext-attr-group .ext-attr-group-inner"):
title = table.find_element(By.CSS_SELECTOR, "h4")
print("====================================================")
print("š¦ "+title.text)
for row in table.find_elements(By.CSS_SELECTOR, ".ext-attr-group-content > div"):
key = row.find_element(By.CSS_SELECTOR, ".col-md-4")
value = row.find_element(By.CSS_SELECTOR, ".col-md-8")
print(str(key.text)+" "+str(value.text))
I generally use a remote Selenium instance, but you can just comment that out and use webdriver.Chrome(options=options)
instead.
This is what some of the data look like:
====================================================
š¦ Ordering
Minimum Order Quantity: 1 piece
Customs Tariff Number: 85389099
Product Main Type: Accessories
====================================================
š¦ Popular Downloads
Data Sheet, Technical Information: 1SFC151007C02__
Instructions and Manuals: 1SFC151011M0201
CAD Dimensional Drawing: 2CDC001079B0201
====================================================
š¦ Dimensions
Product Net Width: 0.038 m
Product Net Depth / Length: 0.038 m
Product Net Height: 0.038 m
Product Net Weight: 0.08 kg