selenium-webdriverwebdynamicdriverscreen-scraping

Extracting data from a dynamic website using selenium


I am trying to extract the name of the participants and the number of modules completed in a CSV file from this website - https://learn.microsoft.com/training/challenges?id=f66f0d57-d644-44d1-9faf-112b18a0ef92

The below is my code, a few days back, I have written a code successfully, but later I tried to modify it and now nothing is working.

from selenium import webdriver
import csv
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By

# Initialize the web driver
driver = webdriver.Chrome() 

# URL of the website
url = "https://learn.microsoft.com/training/challenges?id=f66f0d57-d644-44d1-9faf-112b18a0ef92"
driver.get(url)

# Locate participant names and modules completed elements
participant_names = driver.find_element(By.CSS_SELECTOR, ".is-hidden-mobile.leaderboard-name")
modules_completed = driver.find_element(By.CSS_SELECTOR, "span")

# Extract data and store it in a list
data = []

for name, modules in zip(participant_names, modules_completed):
    data.append([name.text, modules.text])

# Define the CSV file name
csv_file_name = 'participants.csv'

# Write data to a CSV file
with open(csv_file_name, 'w', newline='') as csvfile:
    csvwriter = csv.writer(csvfile)
    csvwriter.writerow(['Participant Name', 'Modules Completed'])  # Write header
    csvwriter.writerows(data)

# Close the browser
driver.quit()

print(f"Data has been scraped and saved to {csv_file_name}.")

This is the inspect element code for the name:

<span class="is-hidden-mobile leaderboard-name"><!---->Abhishek Kumar<!----></span>

For modules completed:

<span><!---->12/12<!----></span>

There is a pagination also, I was trying to include, but since due to errors, I removed it, but still the basic 1st page code is also not working. Page code:

<button type="button" class="pagination-link is-current" data-page="1" aria-label="Page 1 of 4" aria-current="true">
                    1
                </button>

I would really appreciate if any code, or direction is given to complete this without errors.

The errors I encountered are the below

    Traceback (most recent call last):
  File "C:\Users\misss\AppData\Local\Programs\Python\Python311\wspmlsa.py", line 17, in <module>
    participant_names = driver.find_element(By.CSS_SELECTOR, ".is-hidden-mobile.leaderboard-name")
  File "C:\Users\misss\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 738, in find_element
    return self.execute(Command.FIND_ELEMENT, {"using": by, "value": value})["value"]
  File "C:\Users\misss\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 344, in execute
    self.error_handler.check_response(response)
  File "C:\Users\misss\AppData\Local\Programs\Python\Python311\Lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 229, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".is-hidden-mobile.leaderboard-name"}
  (Session info: chrome=117.0.5938.134); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
    GetHandleVerifier [0x00007FF65ADD7D12+55474]
    (No symbol) [0x00007FF65AD477C2]
    (No symbol) [0x00007FF65ABFE0EB]
    (No symbol) [0x00007FF65AC3EBAC]
    (No symbol) [0x00007FF65AC3ED2C]
    (No symbol) [0x00007FF65AC79F77]
    (No symbol) [0x00007FF65AC5F19F]
    (No symbol) [0x00007FF65AC77EF2]
    (No symbol) [0x00007FF65AC5EF33]
    (No symbol) [0x00007FF65AC33D41]
    (No symbol) [0x00007FF65AC34F84]
    GetHandleVerifier [0x00007FF65B13B762+3609346]
    GetHandleVerifier [0x00007FF65B191A80+3962400]
    GetHandleVerifier [0x00007FF65B189F0F+3930799]
    GetHandleVerifier [0x00007FF65AE73CA6+694342]
    (No symbol) [0x00007FF65AD52218]
    (No symbol) [0x00007FF65AD4E484]
    (No symbol) [0x00007FF65AD4E5B2]
    (No symbol) [0x00007FF65AD3EE13]
    BaseThreadInitThunk [0x00007FFE68417344+20]
    RtlUserThreadStart [0x00007FFE6A2226B1+33]

Solution

  • Data is being hydrated in page from an API. Why don;t you scrape that API endpoint directly? You can find it in browser's Dev tools --> Network tab.

    Here is one way to do it:

    import requests
    import pandas as pd
    
    headers= {
        'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36'
    }
    
    r = requests.get("https://learn.microsoft.com/api/challenges/f66f0d57-d644-44d1-9faf-112b18a0ef92/leaderboard?$top=1000&$skip=0&locale=en-gb", headers=headers)
    df = pd.json_normalize(r.json(), record_path=['results'])
    print(df[['rank', 'score', 'userDisplayName']])
    

    Result in terminal:

        rank    score   userDisplayName
    0   1   12.0    _11KRISHNA VAMSI
    1   2   12.0    ABDUL SAMAD KHAN
    2   3   12.0    Abhishek Kumar
    3   4   12.0    Aditya Srivastav
    4   5   12.0    Akshay Gupta
    5   6   12.0    Harshvardhan Nayakal
    6   7   12.0    Khushi
    7   8   12.0    Kreeti Jindal
    8   9   12.0    Md Ibrahim Noman
    9   10  12.0    MUKESH PAL
    10  11  12.0    Patel Harsh Satishkumar
    11  12  12.0    Prashant Dwivedi
    12  13  12.0    Rudraraju Sriya
    13  14  12.0    Sagar Chintamani
    14  15  12.0    Samvarthika . C
    15  16  12.0    Sandeep Kumar Patel
    16  17  12.0    Shashank Kumar Srivastava
    17  18  12.0    Shivanshu_Nigam
    18  19  12.0    Smriti Tiwari
    19  20  12.0    udit kumar singh
    20  21  12.0    Viraj Bhutada
    21  22  7.0     Akanksha Pal
    22  23  2.0     Md Tawsif Mahmud Toha
    23  24  0.0     Asritha mudunuri
    24  25  0.0     Chandramani kumari
    25  26  0.0     Kandukuri Jaswanth
    26  27  0.0     Lilanjan Barman
    27  28  0.0     Nelissa
    28  29  0.0     Paresh Maheshwari
    29  30  0.0     prachothan reddy kuthuru
    30  31  0.0     Radhika Garg
    31  32  0.0     Roopesh Ranjan
    32  33  0.0     Roopesh Ranjan
    33  34  0.0     sayyid hassan shaabani
    34  35  0.0     SEELAM ALEXANDER
    35  36  0.0     Tamilarasan S
    36  37  0.0     Vashu Agarwal
    

    You can explore that json response, maybe look at other data contained.

    Requests documentation can be found here, and for pandas docs, go here.