pythonseleniumselenium-webdriverselenium-chromedriverselenium-rc

How to scrape product details page on selenium


I am learning selenium . Right now my this code can scrape all product title from the font page of this url https://www.daraz.com.bd/consumer-electronics/?spm=a2a0e.pdp.breadcrumb.1.4d20110bzkC0bn but I want to click each product link of this page which will take me to product details page so that I can scrape information from product details page. here is my code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

#argument for incognito Chrome
option = webdriver.ChromeOptions()
option.add_argument(" — incognito")

browser = webdriver.Chrome()

browser.get("https://www.daraz.com.bd/consumer-electronics/?spm=a2a0e.pdp.breadcrumb.1.4d20110bzkC0bn")

# Wait 20 seconds for page to load
timeout = 20
try:
    WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='c16H9d']")))
except TimeoutException:
    print("Timed out waiting for page to load")
    browser.quit()



# find_elements_by_xpath returns an array of selenium objects.
titles_element = browser.find_elements_by_xpath("//div[@class='c16H9d']")


# use list comprehension to get the actual repo titles and not the selenium objects.
titles = [x.text for x in titles_element]
# print out all the titles.
print('titles:')
print(titles, '\n')
browser.quit()

Solution

  • You could use BeautifulSoup to make life easier.

    I've modified your code slightly to illustrate how you could navigate across all the individual product links on a page.

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.common.exceptions import TimeoutException
    from bs4 import BeautifulSoup
    
    #argument for incognito Chrome
    option = Options()
    option.add_argument("--incognito")
    
    
    browser = webdriver.Chrome(options=option)
    
    browser.get("https://www.daraz.com.bd/consumer-electronics/?spm=a2a0e.pdp.breadcrumb.1.4d20110bzkC0bn")
    
    # Wait 20 seconds for page to load
    timeout = 20
    try:
        WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='c16H9d']")))
    except TimeoutException:
        print("Timed out waiting for page to load")
        browser.quit()
    
    soup = BeautifulSoup(browser.page_source, "html.parser")
    
    product_items = soup.find_all("div", attrs={"data-qa-locator": "product-item"})
    for item in product_items:
        item_url = f"https:{item.find('a')['href']}"
        print(item_url)
    
        browser.get(item_url)
    
        item_soup = BeautifulSoup(browser.page_source, "html.parser")
    
        # Use the item_soup to find details about the item from its url.
    
    browser.quit()
    

    It is, in short, exactly what arundeep chohan mentioned in the comment section. You can choose to create a new instance of the browser, say browser1 = webdriver.Chrome() that can navigate all of the product URLs.

    Also, I realized that the incognito mode is not working in your script. You need to define chrome_options and pass it as an argument to the webdriver.Chrome method.