python selenium-chromedriver timeoutexception

Not able to click on element in a Selenium using Python automation script

Issue Description:

I am trying to automate a process where I can visit a website, hover over the menu navigation bar and click on each navigation category options from tier 1 dropdown, visit that page and scrape product details of top 20 products on that page and put it in an excel file. If that page does not contain any product, the script will continue scrolling down till it reaches the end of the page and if no product-div is found, it will go back to the top of the page and click on the next category in the navigation panel

I am working with Selenium (using python) for this. I have attached my code below.

scroll_and_click_view_more function is for scrolling down the page, prod_vitals function is for scraping product details specific to each page, and prod_count function is for extracting total count of products on each page and creating a summary of all pages.

Error Description:

When I run the below code, every function is working fine except one. The first page that this code is scrolling down, does not have any product details. The script will scroll down the entire page, print no product tiles found on that page and then is supposed to click on the next category but for some reason it can't click on the next category in the path. It throws timeout exception error and clicks on the next category which is working fine again. This website has two categories where there is no product tile present and for both of these pages, the script is unable to click on the next category available. I am attaching a screenshot of the error.

Output of my code:

['/feature/unlock-your-courage.html', '/shop/new/women', '/shop/women', '/shop/men/bags', '/shop/collection', '/shop/gift/women/bestseller', '/shop/coachworld', '/shop/coachreloved/coach-reloved']
Reached the end of the page and no product tiles were found:  /feature/unlock-your-courage.html
Element with href /shop/new/women not clickable
Link: 
 /shop/women
Link:
 /shop/men/bags
Link:
 /shop/collection
Link:
 /shop/gift/women/bestseller
Reached the end of the page and no product tiles were found:  /shop/coachworld
Element with href /shop/coachreloved/coach-reloved not clickable

If you look at the output, in the first line, it prints all the navigation categories available on the site. After that, the script visits all the URLs in that array is able to click on all the URLs except the second and eighth one. FYI, the first and seventh category does not contain any product tile on that page. Rest all the links are clickable. The clicking on each category and iterating over the loop is taken care inside the WebScraper class.

Resolution Steps:

I have tried adding time.sleep() in between the actions but still this doesn't work. I also added a step where it is taking a screenshot when timeout exception is happening, I can see the category is visible on screen but still it is not clickable.

I am attaching a screenshot of the output on terminal. Screenshot of error

I am attaching my code below:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support import expected_conditions as EC                   
from bs4 import BeautifulSoup                    
import pandas as pd                               
import time                                         
import re
import os
import shutil
import datetime                                          
import openpyxl
import chromedriver_autoinstaller
from openpyxl import Workbook                                     
from openpyxl.styles import PatternFill
from openpyxl.utils.dataframe import dataframe_to_rows

#custom_path = r"c:\Users\DELL\Documents\Self_Project"           # Define the custom path where you want ChromeDriver to be installed
#temp_path=chromedriver_autoinstaller.install()                  # Installs the ChromeDriver to a temporary directory and returns the path to that directory.
#print("Temporary path",temp_path)
#final_path = os.path.join(custom_path, "chromedriver.exe")      # constructs and stores the full path to the ChromeDriver executable in the custom directory.
#shutil.move(temp_path, final_path)                              # Moves the ChromeDriver executable from the temporary directory to the custom directory.
#print("ChromeDriver installed at:", final_path)

date_time = datetime.datetime.now().strftime("%m%d%Y_%H%M%S")
file_name = f'CRTL_JP_staging_products_data_{date_time}.xlsx'

products_summary = []
max_count_of_products=20

def scroll_and_click_view_more(driver,href):
    flag=False
    last_height = driver.execute_script("return window.pageYOffset + window.innerHeight")               
    while True:
        try:                                                                   
            driver.execute_script("window.scrollBy(0, 800);")
            time.sleep(4)
            new_height1 = driver.execute_script("return window.pageYOffset + window.innerHeight")
            try:
                WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'div.product-tile')))
            except Exception as e:
                new_height = driver.execute_script("return window.pageYOffset + window.innerHeight")
                if new_height1 == last_height and flag==False:
                    print("Reached the end of the page and no product tiles were found: ",href)
                    return "No product tiles found" 
                else:
                    last_height = new_height
                continue                                                                      
            div_count = 0
            flag=True                                                                    
            #while div_count >= 0:                                                           
            response = driver.page_source                                             
            soup = BeautifulSoup(response, 'html.parser')                                
            div_elements = soup.find_all('div', class_ = 'product-tile')                              
            div_count = len(div_elements)                                                 
            if(div_count > max_count_of_products):                                                         
                return(driver.page_source)
            driver.execute_script("window.scrollBy(0, 300);")
            time.sleep(3) 
            new_height = driver.execute_script("return window.pageYOffset + window.innerHeight")
            #print(new_height)
            if new_height == last_height:
                print("Reached the end of the page: ",href)
                return("Reached the end of the page.")
                break
            else:
                last_height = new_height
        except Exception as e:                                                               
            print(e)
            break

def prod_vitals(soup,title,url):                                                                                                                                  
    count_of_items=1
    products_data = []                                                        # Array to store all product data for our excel sheet
    for div in soup.find_all('div', class_ = 'product-tile'):                            # Iterate over each individual product-tile div tag
        if count_of_items<=max_count_of_products:
            #print(title)
            list_price = 0                                                                   # Variable to store list price
            sale_price = 0                                                                   # Variable to store sale price
            discount1 = 0                                                                    # Variable to store discount% that is displayed on the site
            discount2 = 0
            count_of_items = count_of_items+1;                                                                    # Variable to store discount% calculated manually
            res = "Incorrect"                                                                # Variable to store result of discount1==discount2; initialized with Incorrect
            #pro_code = div.select('div.css-1fg6eq7 img')[0]['id']
            pro_name = div.select('div.product-name a.css-avqw6d p.css-1d5mpur')[0].get_text()
            pdpurl = div.select('div.css-grdrdu a.css-avqw6d')[0]['href']
            pdpurl = url+pdpurl
            element = div.select('div.salePriceWrapper span.salesPrice')                     # Extract all the salesPrice span elements inside salePriceWrapper div (Ideally only one should be present) "<span class="chakra-text salesPrice false css-1gi2nbo" data-qa="m_plp_txt_pt_price_upper_rl">¥179000 </span>"
            if element:                                                                      # If sale price exists
                sale_price = float(element[0].get_text().replace('¥', '').replace(',', ''))                   # Extract the text of the first element in the list (which is the price including the dollar sign), removes the dollar sign with the replace method, and converts the result to a float
                res="Correct"
            element = div.select('div.comparablePriceWrapper span.css-l96gil')               # Similarly extract list price
            if element:
                list_price = float(element[0].get_text().replace('¥', '').replace(',', ''))
                percent_off = div.select('div.salePriceWrapper span.css-181q1zt')         # Similarly extract the DR% off text
                if percent_off:
                    percent_off = percent_off[0].get_text()
                    discount1 = re.search(r'\d+', percent_off).group()                                      # Extract only the digits from the DR% using the search function from regex library and group them together; return type is a string
                    discount1 = int(discount1)                                                              
                else:
                    percent_off = 0                                                                # Convert the DR% characters into integer
                discount2 = round(((list_price - sale_price) / list_price) * 100)                       # Calculate the correct DR% manually using list price and sale price     
                if(discount1 == discount2):                                                                 # Check if DR% on site matches with the expected DR% or not
                    res = "Correct"                                                                     # If yes then store result as correct else Incorrect
                else:
                    res = "Incorrect"
            products_data.append({'Product Name': pro_name,'Product URL': pdpurl, 'Sale Price': '¥'+format(sale_price, '.2f'), 'List Price': '¥'+format(list_price, '.2f'), 'Discount on site': str(discount1)+'%', 'Actual Discount': str(discount2)+'%', 'Result': res})      # Append the extracted data to the list     
        else:
            break
    time.sleep(5)
    df = pd.DataFrame(products_data, columns=['Product Name', 'Product URL', 'Sale Price', 'List Price', 'Discount on site', 'Actual Discount', "Result" ])     # Convert the array along with specific column names to a pandas DataFrame; A DataFrame is a two-dimensional labeled data structure with columns potentially of different types
    if os.path.exists(file_name):
        book = openpyxl.load_workbook(file_name)
    else:
        book = Workbook()
        default_sheet = book.active
        book.remove(default_sheet)
    sheet = book.create_sheet(title)
    for row in dataframe_to_rows(df, index=False, header=True):
        sheet.append(row)
    yellow_fill = PatternFill(start_color='FFFF00', end_color='FFFF00', fill_type='solid')
    green_fill = PatternFill(start_color='00FF00', end_color='00FF00', fill_type='solid')
    for row in range(2, sheet.max_row + 1):
        cell = sheet.cell(row=row, column=8)
        if cell.value == "Correct":
            cell.fill = green_fill
        else:
            cell.fill = yellow_fill
    book.save(file_name)

def prod_count(soup,title):  
    product_count_element = soup.find('p', {'class': 'chakra-text total-count css-120gdxl', 'data-qa': 'plp_txt_resultcount'})
    if product_count_element:
        pro_count_text = product_count_element.get_text()
        pro_count_text = pro_count_text.replace(',', '')
        pro_count = re.search(r'\d+', pro_count_text).group()
        products_summary.append({'Category': title,'Total products available': pro_count, 'Total products scraped': max_count_of_products}) 

class WebScraper:
    def __init__(self):
        self.url = "https://staging1-japan.coach.com/?auto=true"
        self.reloved_url="https://staging1-japan.coach.com/shop/coachreloved/coach-reloved"
        self.driver = webdriver.Chrome()
        #options = Options()
        #options.add_argument("--lang=en")
        #self.driver = webdriver.Chrome(service=Service(r"c:\Users\DELL\Documents\Self_Project\chromedriver.exe"), options=options)
    def scrape(self):                                                                          
            self.driver.get(self.url)                                                                 
            self.driver.maximize_window()                                                            
            time.sleep(5)
            nav_count = 0
            soup = BeautifulSoup(self.driver.page_source, 'html.parser')
            links = soup.find('div', {'class': 'css-wnawyw'}).find_all('a', {'class': 'css-ipxypz'})
            hrefs = [link.get('href') for link in links]
            print(hrefs)
            for i,href in enumerate(hrefs):
                try:
                    #print(href)
                    element1 = WebDriverWait(self.driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR, f'a[href="{href}"]')))
                    #self.driver.execute_script("arguments[0].scrollIntoView(true);", element1)
                    self.driver.execute_script("window.scrollTo(0, arguments[0].getBoundingClientRect().top + window.scrollY - 100);", element1)
                    time.sleep(10)
                    is_visible = self.driver.execute_script("return arguments[0].offsetParent !== null && arguments[0].getBoundingClientRect().top >= 0 && arguments[0].getBoundingClientRect().left >= 0 && arguments[0].getBoundingClientRect().bottom <= (window.innerHeight || document.documentElement.clientHeight) && arguments[0].getBoundingClientRect().right <= (window.innerWidth || document.documentElement.clientWidth);", element1)
                    #print("Displayed: {element1.is_displayed()}, Visible: {is_visible}")
                    WebDriverWait(self.driver, 30).until(EC.element_to_be_clickable((By.CSS_SELECTOR, f'a[href="{href}"]'))).click()
                    time.sleep(3)
                    response = scroll_and_click_view_more(self.driver,href)
                    time.sleep(3)                                               
                    if(response!="No product tiles found" and response!="Reached the end of the page."):
                        print("Link: \n",href)
                        soup = BeautifulSoup(response, 'html.parser')
                        PLP_title=links[nav_count].get('title')
                        prod_vitals(soup,PLP_title,self.url)
                        time.sleep(5)
                        prod_count(soup,PLP_title)
                        self.driver.execute_script("window.scrollBy(0, -500);")
                    else:
                        self.driver.execute_script("window.scrollTo(0,0);")
                        #element2 = WebDriverWait(self.driver, 15).until(EC.presence_of_element_located((By.CSS_SELECTOR, f'a[href="{hrefs[i+1]}"]')))
                        #self.driver.execute_script("window.scrollTo(0, arguments[0].getBoundingClientRect().top + window.scrollY - 100);", element2)
                        #time.sleep(3)
                        #is_visible = self.driver.execute_script("return arguments[0].offsetParent !== null && arguments[0].getBoundingClientRect().top >= 0 && arguments[0].getBoundingClientRect().left >= 0 && arguments[0].getBoundingClientRect().bottom <= (window.innerHeight || document.documentElement.clientHeight) && arguments[0].getBoundingClientRect().right <= (window.innerWidth || document.documentElement.clientWidth);", element2)
                        #print(f"Element href: {hrefs[i+1]}, Displayed: {element2.is_displayed()}, Visible: {is_visible}")
                        time.sleep(3)
                        continue
                except TimeoutException:
                    print(f"Element with href {href} not clickable")
                    self.driver.save_screenshot('timeout_exception.png')
                except Exception as e:
                    print(f"An error occurred: {e}")
                nav_count+=1
            df = pd.DataFrame(products_summary, columns=['Category', 'Total products available','Total products scraped'])
            book = openpyxl.load_workbook(file_name)
            sheet = book.create_sheet('Summary')
            for row in dataframe_to_rows(df, index=False, header=True):
                sheet.append(row)
            book.save(file_name)  
scraper = WebScraper()
scraper.scrape()                       
time.sleep(5)                         
scraper.driver.quit()

Please find my updated code below as per @mehdi-ahmadi's comment and along with it the output and issues I am facing now

I initially tried with your first option but that was not working fine so decided to change the logic instead and tried with second option by getting anchor from nav each time. With this logic, the second link is clickable now ('/shop/new/women'). However, the last link is again getting timeout exception and not able to click on it(/shop/coachreloved/coach-reloved).

Please find the output below:

0 /feature/unlock-your-courage.html
Reached the end of the page and no product tiles were found:  /feature/unlock-your-courage.html
nav_count 1
1 /shop/new/women
nav_count 2
2 /shop/women
nav_count 3
3 /shop/men/bags
nav_count 4
4 /shop/collection
nav_count 5
5 /shop/gift/women/bestseller
nav_count 6
6 /shop/coachworld
Reached the end of the page and no product tiles were found:  /shop/coachworld
nav_count 7
Element with href /shop/coachreloved/coach-reloved not clickable

I am attaching my updated class also below. Can you please help?

def scrape(self):
    self.driver.get(self.url)
    self.driver.maximize_window()
    time.sleep(5)
    nav_count = 0
    while True:
        try:
            # Refresh the page source and parse it
            soup = BeautifulSoup(self.driver.page_source, 'html.parser')
            links = soup.find('div', {'class': 'css-wnawyw'}).find_all('a', {'class': 'css-ipxypz'})
            hrefs = [link.get('href') for link in links]
            # Check if nav_count is within the range of hrefs
            if nav_count < len(hrefs):
                href = hrefs[nav_count]
                time.sleep(2)
                element = WebDriverWait(self.driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR, f'a[href="{href}"]')))
                self.driver.execute_script("arguments[0].scrollIntoView(true);", element)
                time.sleep(3)
                WebDriverWait(self.driver, 30).until(EC.element_to_be_clickable((By.CSS_SELECTOR, f'a[href="{href}"]'))).click()
                time.sleep(3)
                print(nav_count, href)
                response = scroll_and_click_view_more(self.driver, href)
                time.sleep(3)
                if response != "No product tiles found" and response != "Reached the end of the page.":
                    #print("Link: \n", href)
                    soup = BeautifulSoup(response, 'html.parser')
                    PLP_title = links[nav_count].get('title')
                    prod_vitals(soup, PLP_title, self.url)
                    time.sleep(5)
                    prod_count(soup, PLP_title)
                    self.driver.execute_script("window.scrollBy(0, -500);")
                    time.sleep(2)
                else:
                    self.driver.get(self.url)
                    time.sleep(5)
                    continue
            else:
                break
        except TimeoutException:
            print(f"Element with href {href} not clickable")
            self.driver.save_screenshot('timeout_exception.png')
        except Exception as e:
            print(f"An error occurred: {e}")
        finally:
            nav_count += 1
            print("nav_count", nav_count)

Solution

the problem is you get anchor tags from https://staging1-japan.coach.com/?auto=true and you did save them in list but when you are in this page https://staging1-japan.coach.com/feature/unlock-your-courage.html you want to click on anchor tag is in https://staging1-japan.coach.com/?auto=true and this is not possible so Maybe you say that both these anchor refer to the same address or are completely identical. But it has no meaning for the browser. Rather, these are two anchor on two separate pages, and you cannot click on something that is on the other page when you are on different page.

so the one solution is to load the page you read anchor from

in class WebScraper method scrape in for loop for i,href in enumerate(hrefs): you can add this code self.driver.get(self.url) sorry this is massy code i cant write all of that for you this is just part of your code to see how to change it

for i,href in enumerate(hrefs):
                try:
                    ##########new line added##########
                    self.driver.get(self.url)
                    ##################################
                    #print(href)
                    element1 = WebDriverWait(self.driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR, f'a[href="{href}"]')))
                    #self.driver.execute_script("arguments[0].scrollIntoView(true);", element1)
                    self.driver.execute_script("window.scrollTo(0, arguments[0].getBoundingClientRect().top + window.scrollY - 100);", element1)
                    time.sleep(10)

second solution is to get anchor from nav each time from each page your in if you are sure anchor are in all pages and they are same

third is to open each link in new tab and back to the first tab in the end so your code will be :

def scrape(self):                                                                          
        self.driver.get(self.url)                                                                 
        # self.driver.maximize_window()                                                            
        time.sleep(5)
        nav_count = 0
        soup = BeautifulSoup(self.driver.page_source, 'html.parser')
        links = soup.find('div', {'class': 'css-wnawyw'}).find_all('a', {'class': 'css-ipxypz'})
        hrefs = [link.get('href') for link in links][-2:]
        mainWindow = self.driver.window_handles[0]
        for i,href in enumerate(hrefs):
            try:
                #print(href)
                self.driver.switch_to.window(mainWindow)
                # abslute_url = self.driver.get+href
                element1 = WebDriverWait(self.driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR, f'a[href="{href}"]')))
                
                #self.driver.execute_script("arguments[0].scrollIntoView(true);", element1)
                self.driver.execute_script("window.scrollTo(0, arguments[0].getBoundingClientRect().top + window.scrollY - 100);", element1)
                time.sleep(10)
                # is_visible = self.driver.execute_script("return arguments[0].offsetParent !== null && arguments[0].getBoundingClientRect().top >= 0 && arguments[0].getBoundingClientRect().left >= 0 && arguments[0].getBoundingClientRect().bottom <= (window.innerHeight || document.documentElement.clientHeight) && arguments[0].getBoundingClientRect().right <= (window.innerWidth || document.documentElement.clientWidth);", element1)
                #print("Displayed: {element1.is_displayed()}, Visible: {is_visible}")
                WebDriverWait(self.driver, 30).until(EC.element_to_be_clickable((By.CSS_SELECTOR, f'a[href="{href}"]'))).send_keys(Keys.CONTROL+Keys.ENTER)
                newTab = self.driver.window_handles[-1]
                self.driver.switch_to.window(newTab)
                
                time.sleep(3)
                response = scroll_and_click_view_more(self.driver,href)
                time.sleep(3)                                               
                if(response!="No product tiles found" and response!="Reached the end of the page."):
                    print("Link: \n",href)
                    soup = BeautifulSoup(response, 'html.parser')
                    PLP_title=links[nav_count].get('title')
                    prod_vitals(soup,PLP_title,self.url)
                    time.sleep(5)
                    prod_count(soup,PLP_title)
                    self.driver.execute_script("window.scrollBy(0, -500);")
                else:
                    self.driver.execute_script("window.scrollTo(0,0);")
                    #element2 = WebDriverWait(self.driver, 15).until(EC.presence_of_element_located((By.CSS_SELECTOR, f'a[href="{hrefs[i+1]}"]')))
                    #self.driver.execute_script("window.scrollTo(0, arguments[0].getBoundingClientRect().top + window.scrollY - 100);", element2)
                    #time.sleep(3)
                    #is_visible = self.driver.execute_script("return arguments[0].offsetParent !== null && arguments[0].getBoundingClientRect().top >= 0 && arguments[0].getBoundingClientRect().left >= 0 && arguments[0].getBoundingClientRect().bottom <= (window.innerHeight || document.documentElement.clientHeight) && arguments[0].getBoundingClientRect().right <= (window.innerWidth || document.documentElement.clientWidth);", element2)
                    #print(f"Element href: {hrefs[i+1]}, Displayed: {element2.is_displayed()}, Visible: {is_visible}")
                    time.sleep(3)
                    continue
            except TimeoutException:
                print(f"Element with href {href} not clickable")
                self.driver.save_screenshot('timeout_exception.png')
            except Exception as e:
                print(f"An error occurred: {e}")
            nav_count+=1
            self.driver.close()
        df = pd.DataFrame(products_summary, columns=['Category', 'Total products available','Total products scraped'])
        book = openpyxl.load_workbook(file_name)
        sheet = book.create_sheet('Summary')
        for row in dataframe_to_rows(df, index=False, header=True):
            sheet.append(row)
        book.save(file_name)

this give you list of tabs

self.driver.window_handles

use this to switch to other tab

self.driver.switch_to.window(mainWindow)

use this to close current active tab (if current tab is only tab close method work like quit method )

self.driver.close()

update

to open new tab in browser you can use send_keys(Keys.CONTROL+Keys.ENTER) instead of clicke()

WebDriverWait(self.driver, 30).until(EC.element_to_be_clickable((By.CSS_SELECTOR, f'a[href="{href}"]'))).send_keys(Keys.CONTROL+Keys.ENTER)

or you can use execute_script as @Annie says in comments

self.driver.execute_script(f"window.open('{href}', '_blank');")