python-3.x selenium-webdriver web-scraping xpath google-image-search

Python, Selenium web scrapping error with xpath: invalid selector,... is not a valid XPath expression, ... 'evaluate' on 'Document'

I'm doing a tutorial and the task is to download pictures from "Google Images", using Python and Selenium but I have some problems.

import bs4
import requests
from selenium import webdriver
import os
import time

chromeDriverPath=r'C:\Users\Aorus\Downloads\Z_ARCHIWUM\PythonScript\chromedriver_win32\chromedriver.exe'
driver=webdriver.Chrome(chromeDriverPath)

search_URL = 'https://www.google.com/search?q=budynki&rlz=1C1GCEU_plPL919PL919&source=lnms&tbm=isch&sa=X&ved=2ahUKEwiRyJvoo_L9AhWJxIsKHTIKDqwQ_AUoAXoECAEQAw&biw=1553&bih=724'

driver.get(search_URL)

a = input('Waiting for user input to start...')

# Scrolling all the way up
driver.execute_script('window.scrollTo(0, 0);')

page_html = driver.page_source
pageSoup = bs4.BeautifulSoup(page_html, 'html.parser')
containers = pageSoup.findAll('div', {'class':'isv-r PNCib MSM1fd BUooTd'})

len_containers = len(containers)
print('Found %s image containers'%(len_containers))

xPath1 = '//*[@id="islrg"]/div[1]/div[13]'


for i in range(1, len_containers+1):
    if i % 25 == 0:
        continue
    
    xPath2 = xPath1 + str(i)
    driver.find_element("xpath", xPath2).click()

and I got this error:

InvalidSelectorException: invalid selector: Unable to locate an element with the xpath expression //*[@id="islrg"]/div[1]/div[13]1 because of the following error:

SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//*[@id="islrg"]/div[1]/div[13]1' is not a valid XPath expression.

I chose a bad DIV or somewhere I should add str() or .text or the XPath is bad? When I choose a single picture to use .click(), it works.

Solution

The error message shows exactly what went wrong.

The string '//*[@id="islrg"]/div[1]/div[13]1' is not a valid XPath expression.

You took an XPath

xPath1 = '//*[@id="islrg"]/div[1]/div[13]'

and then appended '1' to it in the line below (because i is 1)

xPath2 = xPath1 + str(i)

which becomes

'//*[@id="islrg"]/div[1]/div[13]' + '1'
'//*[@id="islrg"]/div[1]/div[13]1'

which is the exact string from the error message. The problem is that this is not a valid XPath... the final '1' at the end of the string makes it invalid.

After reviewing your entire script, I think there's a simpler way to approach this. Right now you've got BeautifulSoup in your script but it's not needed... you can get all of this using Selenium alone, simplifying everything.

One issue I ran into while writing this script is that the images take a moment to load. We can't use a standard WebDriverWait here because we don't know how many images are going to appear. So, we write a method that polls the page every 100ms to see if the count of images has gone up. We keep looping until the count is stable, meaning all the images have loaded.

def wait_for_images(locator)
    count = 0
    images = driver.find_elements(*locator)
    while len(images) != count:
        count = len(images)
        time.sleep(.1)
        images = driver.find_elements(*locator)

    return images

Now that we have the helper method, we can write the main script

chromeDriverPath = r'C:\Users\Aorus\Downloads\Z_ARCHIWUM\PythonScript\chromedriver_win32\chromedriver.exe'
driver = webdriver.Chrome(chromeDriverPath)

search_URL = 'https://www.google.com/search?q=budynki&rlz=1C1GCEU_plPL919PL919&source=lnms&tbm=isch&sa=X&ved=2ahUKEwiRyJvoo_L9AhWJxIsKHTIKDqwQ_AUoAXoECAEQAw&biw=1553&bih=724'
driver.get(search_URL)

a = input('Waiting for user input to start...')

# Scrolling all the way up
driver.execute_script('window.scrollTo(0, 0);')

for image in wait_for_images((By.CSS_SELECTOR, ".bRMDJf.islir > img[src]")):
    print(image.get_attribute("src"))

This prints the URLs of each image that you can navigate to separately and download or whatever you need to do with them.