pythonselenium-webdriverweb-scrapingdata-analysis

Selenium isn't recognizing a button on my webpage


I am trying to automate the data entry for a project I'm working on by making a selenium code to pull all of the data from a survey report page. My current issue is that for some reason one of the lines of code won't recognize a button in the HTML of the webpage to click it. The basic layout of the webpage follows a mostly consistent pattern of each question made into a drop down menu that shows several graphs of disaggregations of the data. Each graph also has its own menu button. What I want to accomplish with this code is to open each question, then open each disaggregation menu button, click "View data table", harvest the data, then repeat for each disaggregation in a question and for all questions. Once all the data is scraped, I'll work on formatting it how I need it and putting it in a csv.

For the goal I've described, I haven't finished writing the code entirely yet, but I have hit a snag pretty early on that I have been trying to fix for days now. For the code below, the button to click on the disaggregation menu won't function (either gives me an error or just gets ignored altogether). The question drop down works, but there is something wrong with the disaggregations part (disagg_buttons). See below:


URL = "https://secure.panoramaed.com/ride/understand/1302972/survey_results/27746568#/questions"
URL2 = "https://secure.panoramaed.com/ride/understand?auth_token=geZrUH8yRr8_Ln_C9LH3"

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import os

driver=webdriver.Firefox()
driver.get(URL2)
driver.get(URL)
html_text = driver.page_source

# Finds each question in the webpage based on the class "expandable-row". This will identify all of the questions on the page.
questions_buttons = driver.find_elements(By.CLASS_NAME, "expandable-row")

# Opens each drop down menu so that we can see all of the disaggregations of each question.
for question_button in questions_buttons:
     question_button.click()

     #  Finds each disaggregation breakdown button by the class "highcharts-a11y-proxy-button.highcharts-no-tooltip".
     disagg_buttons = driver.find_elements(By.CSS_SELECTOR, 'button.highcharts-a11y-proxy-button highcharts-no-tooltip')

     for disagg_button in disagg_buttons:
          disagg_button.click()

          # When the disaggregation button is clicked, this will select the first item in the list which is the "View Data Table" option.
          view_datatable_button = driver.find_element(By.CLASS_NAME, "li")
          view_datatable_button.click()

rendered_html = driver.find_elements(By.CLASS_NAME, "ng-scope")
for e in rendered_html:
    pass


# soup=BeautifulSoup(html_text, "html.parser")
# soup_pretty = soup.prettify()
# print(soup_pretty)


# file = open("output.txt", "a")
# file.write(html_text)
# file.close()

I've tried a couple of different options including find_elements(By.CLASS_NAME, "highcharts-a11y-proxy-button highcharts-no-tooltip"), find_elements(By.CSS_SELECTOR, 'button.highcharts-a11y-proxy-button.highcharts-no-tooltip'), and find_elements(By.XPATH, //button[starts-with(@button, 'highcharts-a11y-proxy-button')]) but have not had success with any of these methods.


Solution

  • I'd suggest the following approach:

    1. iterate over all questions and expand each one; then
    2. trigger popup menus for each items; then
    3. choose "View data table" option for each menu.

    This will expand all of the tables for a specific question.

    Then you can extract the information from each table.

    import time
    from io import StringIO
    
    from selenium import webdriver
    from selenium.webdriver.chrome.service import Service
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from bs4 import BeautifulSoup
    import pandas as pd
    
    URL = "https://secure.panoramaed.com/ride/understand/1302972/survey_results/27746568#/questions"
    URL_AUTH = "https://secure.panoramaed.com/ride/understand?auth_token=geZrUH8yRr8_Ln_C9LH3"
    
    DRIVER_PATH = "/usr/bin/chromedriver"
    
    options = Options()
    
    service = Service(executable_path=DRIVER_PATH)
    
    driver = webdriver.Chrome(service=service, options=options)
    
    # Authenticate.
    driver.get(URL_AUTH)
    # Open target page.
    driver.get(URL)
    
    time.sleep(5)
    
    questions = driver.find_elements(By.CSS_SELECTOR, ".expandable-row")
    
    for question in questions:
        # Open question.
        driver.execute_script("arguments[0].click();", question)
        time.sleep(3)
    
        # Open context menus.
        menus = driver.find_elements(By.XPATH, "//button[@aria-label='View chart menu, Chart']")
    
        for menu in menus:
            driver.execute_script("arguments[0].click();", menu)
            time.sleep(1)
        
        # Get buttons to display tables.
        views = driver.find_elements(By.XPATH, "//li[contains(text(), 'View data table')]")
    
        for view in views:
            driver.execute_script("arguments[0].click();", view)
            time.sleep(1)
    
        # Now scrape the contents of the revealed tables.
        #
        tables = driver.find_elements(By.CSS_SELECTOR, ".highcharts-data-table > table")
        for table in tables:
            df = pd.read_html(StringIO(table.get_attribute("outerHTML")))[0]
            print(df)
    
        # Close question.
        driver.execute_script("arguments[0].click();", question)
        time.sleep(3)
    
    driver.close()
    

    I had trouble with the .click() method on some of the items, so resorted to doing it via JavaScript.

    This is what the output looks like:

                 Category  Responses                                                                                                                                                                                                        
    0  Not at all excited       1840                                                                                                                                                                                                        
    1    Slightly excited       2183                                                                                                                                                                                                        
    2    Somewhat excited       3801                                                                                                                                                                                                        
    3       Quite excited       1686                                                                                                                                                                                                        
    4   Extremely excited        726                                                                                                                                                                                                        
           Category  Percentage favorable responses                                                                                                                                                                                         
    0    Providence                              24                                                                                                                                                                                         
    1  Rhode Island                              20                                                                                                                                                                                         
                       Category  Providence  Rhode Island                                                                                                                                                                                   
    0                        No          23            19                                                                                                                                                                                   
    1  Yes, for part of the day          22            20                                                                                                                                                                                   
    2  Yes, for most of the day          33            26                                                                                                                                                                                   
                                       Category  Providence  Rhode Island                                                                                                                                                                   
    0                                    Female          21            18                                                                                                                                                                   
    1                                      Male          26            22                                                                                                                                                                   
    2                                 Nonbinary          13            15                                                                                                                                                                   
    3  I use another word to describe my gender          28            20                                                                                                                                                                   
    4      I prefer not to answer this question          26            19                                                                                                                                                                   
                                                Category  Providence  Rhode Island                                                                                                                                                          
    0  There is no one in the family or home who is c...          22            20                                                                                                                                                          
    1                                    0 days per week          23            18                                                                                                                                                          
    2                               1 or 2 days per week          20            18                                                                                                                                                          
    3                               3 to 5 days per week          31            24                                                                                                                                                          
    4                               6 or 7 days per week          38            28
    

    Obviously, rather than printing the tables to the console you'll want to save them to a file.

    This is the contents of requirements.txt for me:

    attrs==23.2.0
    beautifulsoup4==4.12.3
    bs4==0.0.2
    certifi==2024.7.4
    h11==0.14.0
    idna==3.7
    lxml==5.2.2
    numpy==2.0.0
    outcome==1.3.0.post0
    pandas==2.2.2
    PySocks==1.7.1
    python-dateutil==2.9.0.post0
    pytz==2024.1
    selenium==4.22.0
    six==1.16.0
    sniffio==1.3.1
    sortedcontainers==2.4.0
    soupsieve==2.5
    trio==0.26.0
    trio-websocket==0.11.1
    typing_extensions==4.12.2
    tzdata==2024.1
    urllib3==2.2.2
    websocket-client==1.8.0
    wsproto==1.2.0