pythonhtmlseleniumweb-scraping

Selenium won't scrape images


I'm trying to scrape a website for images via selenium python. I have been having issues in locating the image.

Code:

driver = webdriver.Chrome('chromedriver')
begining_of_url = "https://lookup.guru/"
whole_url = begining_of_url + str(target_id)
driver.get(whole_url)
images = driver.find_elements_by_tag_name('img')
for image in images:
    global pfp
    pfp = (image.get_attribute('src'))
    break
print(pfp)

The code doesn't go down the for image in images as the value of pfp doesn't change (found this after testing). I have also checked that the url is correct. You can see that there are images via

<img src="https://cdn.discordapp.com/avatars/763797441275232307/036457e7064e3268506f52756e45c973.png" alt="S3rene" class="h-full w-full object-cover object-center relative z-20">

in the html. I have tried to wait for both 5 and 10 seconds with time.sleep - the output from that is:

DevTools listening on ws://127.0.0.1:62910/devtools/browser/8fa57016-6d5f-4285-96bf-8192a0d8c073 [8416:11684:0130/112545.380:ERROR:chrome_browser_main_extra_parts_metrics.cc(227)] START: ReportBluetoothAvailability(). If you don't see the END: message, this is crbug.com/1216328. [8416:11684:0130/112545.380:ERROR:chrome_browser_main_extra_parts_metrics.cc(230)] END: ReportBluetoothAvailability() [8416:11684:0130/112545.380:ERROR:chrome_browser_main_extra_parts_metrics.cc(235)] START: GetDefaultBrowser(). If you don't see the END: message, this is crbug.com/1216328. [8416:15240:0130/112545.386:ERROR:device_event_log_impl.cc(214)] [11:25:45.386] Bluetooth: bluetooth_adapter_winrt.cc:1075 Getting Default Adapter failed. [8416:11684:0130/112545.394:ERROR:chrome_browser_main_extra_parts_metrics.cc(239)] END: GetDefaultBrowser() D:\Atom Projects\DCF\main.py:38: DeprecationWarning: find_elements_by_* commands are deprecated. Please use find_elements() instead images = driver.find_elements_by_tag_name('img')

It outputs a test that I implemented, but doesn't find the images.

I'm not very good at web scraping and i'm eagar to learn what i'm doing wrong. It is possible that the website is made via javascript which is why im using selenium. I have looked at many website for assitance but I have found no fixes. Thanks for any help


Solution

  • Your code is missing a wait. You are trying to get all the img elements before page completely loaded.
    The best approach to wait for elements to be completely loaded is to use Expected Conditions explicit waits.
    This should work better:

    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    driver = webdriver.Chrome('chromedriver')
    begining_of_url = "https://lookup.guru/"
    target_id = "763797441275232307"
    whole_url = begining_of_url + str(target_id)
    driver.get(whole_url)
    wait = WebDriverWait(driver, 20)
    wait.until(EC.visibility_of_element_located((By.XPATH, "//img")))
    images = driver.find_elements_by_tag_name('img')
    pfp = ""
    for image in images:
        pfp = (image.get_attribute('src'))
        print(pfp)