pythoncssselenium-webdriverwebdrivertext-extraction

How do I automatically scroll to a specific section in the DOM using Selenium?


I'm trying to use Selenium to scroll to a specific section on a webpage and retrieve the text from that section.

Context:

I’m working with a webpage that disables text highlighting through CSS properties like user-select: none and -webkit-user-select: none. I can disable these properties with JavaScript, but my main challenge right now is automatically scrolling down to the "Production / Artist" section in the DOM and then fetching the text.

Here’s the URL of the webpage I’m working with:
Webpage Link

I’ve tried using Selenium to scroll to the "Production / Artist" section, but I’m not sure if I’m using the correct method for this particular page structure.

My Current Code:

from selenium import webdriver
from selenium.webdriver.common.by import By

# Initialize WebDriver
driver = webdriver.Chrome()

# Open the URL
url = "https://www.art-mate.net/doc/78492?name=%E6%A8%82%E3%83%BB%E8%AA%BC%E7%8D%A8%E5%A5%8F%E5%AE%B6%E6%A8%82%E5%9C%98%E2%94%80%E2%94%80%E5%A4%A7%E6%8F%90%E7%90%B4%E8%88%87%E9%A6%AC%E7%89%B9%E8%AB%BE%E7%90%B4%E3"
driver.get(url)

# Scroll to the "Production / Artist" section
element = driver.find_element(By.XPATH, "//h2[text()='Production / Artist']")
driver.execute_script("arguments[0].scrollIntoView();", element)

# Now attempt to copy the text from the section
production_artist_section = driver.find_element(By.XPATH, "//div[contains(text(), 'Production / Artist')]")
print(production_artist_section.text)

# Close the driver
driver.quit()

The Issue:

My Question:

How do I ensure that Selenium scrolls smoothly and accurately to the "Production / Artist" section on the page before I attempt to fetch the text?

Any help or advice on how to optimize the scrolling behavior would be greatly appreciated!


Solution

  • Check the working code below to extract and store the values from "Production / Artist" section into an array:

    Code:

    import time
    from selenium import webdriver
    from selenium.webdriver.support.wait import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    driver = webdriver.Chrome()
    url = "https://www.art-mate.net/doc/78492?name=%E6%A8%82%E3%83%BB%E8%AA%BC%E7%8D%A8%E5%A5%8F%E5%AE%B6%E6%A8%82%E5%9C%98%E2%94%80%E2%94%80%E5%A4%A7%E6%8F%90%E7%90%B4%E8%88%87%E9%A6%AC%E7%89%B9%E8%AB%BE%E7%90%B4%E3"
    driver.get(url)
    driver.maximize_window()
    wait = WebDriverWait(driver, 10)
    
    # Click on 'En' element
    wait.until(EC.element_to_be_clickable((By.XPATH, "//a[@class='cms_lang cms_lang_en']"))).click()
    time.sleep(5)
    people = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='people_cell people_role']")))
    name = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='people_box']//a")))
    
    people_roles = []
    people_names = []
    
    # Below for loops will append each web element into the respective arrays
    for p in people:
        people_roles.append(p.text)
    
    for n in name:
        people_names.append(n.text)
    
    print("People roles:", people_roles)
    print("People names:", people_names)
    

    Console result:

    People roles: ['Presented by', 'Artistic Director / Cello', 'Ondes Martenot', 'Composer', 'Viola', 'Performed by']
    People names: ['Musicus Society', 'Trey Lee', 'Nadia Ratsimandresy', 'Seung-Won Oh', 'Aurélie Entringer', 'Musicus Soloists Hong Kong']
    
    Process finished with exit code 0