I am struggling to create a data frame, but the current works as it scrap the website title and course. Now i am struggling to write some functions using data frame that will count from the website as to how many url links it has. Thereafter must then translate these text context from the website(English into Hindi). Anyone who can help with me with this issue?
`# scrapping of the class-central.com website links
# this application uses selinium driver to access the web-pages
#
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
url = "https://www.classcentral.com/collection/top-free-online-courses"
driver = webdriver.Chrome()
driver.get(url)
time.sleep(2)
all_courses = driver.find_element(by=By.CLASS_NAME, value='catalog-grid__results')
course_titles = all_courses .find_elements(by=By.CSS_SELECTOR, value='[class="color-charcoal course-name"]')
for title in course_titles:
print(title.text)
`
I'm not sure I understand correctly but if you want to load all courses, you'll have to click on "Load more" until the button isn't available. You can get the URLs of the course via the href
attribute:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
import pandas as pd
import time
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument("window-size=1920,1080")
driver = webdriver.Chrome(chrome_options=chrome_options)
url = "https://www.classcentral.com/collection/top-free-online-courses"
driver.get(url)
try:
while True:
# wait until button is clickable
WebDriverWait(driver, 1).until(
expected_conditions.element_to_be_clickable((By.XPATH, "//button[@data-name='LOAD_MORE']"))
).click()
time.sleep(0.5)
except Exception as e:
pass
all_courses = driver.find_element(by=By.CLASS_NAME, value='catalog-grid__results')
courses = all_courses.find_elements(by=By.CSS_SELECTOR, value='[class="color-charcoal course-name"]')
df = pd.DataFrame([[course.text, course.get_attribute('href')] for course in courses],
columns=['Title (eng)', 'Link'])
Output:
Title (eng) Link
0 Medical Parasitology | 医学寄生虫学 https://www.classcentral.com/course/edx-medica...
1 Understanding Medical Research: Your Facebook ... https://www.classcentral.com/course/medical-re...
2 An Introduction to Interactive Programming in ... https://www.classcentral.com/course/interactiv...
3 Mountains 101 https://www.classcentral.com/course/mountains-...
4 Quantum Mechanics for Everyone https://www.classcentral.com/course/edx-quantu...
.. ... ...
260 Web Security Fundamentals https://www.classcentral.com/course/edx-web-se...
261 Viral Marketing and How to Craft Contagious Co... https://www.classcentral.com/course/wharton-co...
262 Introduction to Linux https://www.classcentral.com/course/edx-introd...
263 Bitcoin and Cryptocurrency Technologies https://www.classcentral.com/course/bitcointec...
264 Machine Learning Foundations: A Case Study App... https://www.classcentral.com/course/ml-foundat...
[265 rows x 2 columns]