In the picture below I have the link of the image as src.
but when using BeautifulSoup I got this output:
image['src']
assets/images/content/TUL_5890.jpg
Could you please let me know how to extract the image link in such a case?
I think that is because of the onerror
in the code. but I don't know how to fix it .
If you see response html present in soup
,
<a class="img-wrapper fancybox" data-caption="Pedestrian Crosswalk Sign" data-fancybox="group" href="assets/images/content/street_view_1a.jpg">
<img alt="Pedestrian Crosswalk Sign" src="assets/images/content/street_view_1a.jpg"/>
it does not have the entire path as you see in chrome which is probably added by your browser. Hence you were not getting the full path. You will have to extract the tag src and concat it with the FQDN.
from bs4 import BeautifulSoup
import requests
response = requests.get('https://www.pexco.com/traffic/products/pedestrian-safety-products/in-street-pedestrian-crosswalk-signs/')
soup = BeautifulSoup(response.text, 'lxml')
for imgTag in soup.find_all('img'):
img_src = imgTag['src']
if ('assets' in img_src):
print('https://www.pexco.com/' + img_src)
else:
print(img_src)
This gives us :
https://www.webtraxs.com/webtraxs.php?id=pexco&st=img
https://www.pexco.com/assets/images/template/pexco-logo-dark.svg
https://www.pexco.com/assets/images/banners/bg-banner-traffic-desktop.jpg
https://www.pexco.com/assets/images/content/TUL_5890.jpg
https://www.pexco.com/assets/images/content/Davidson_STOP_4_Ped_Sign_Atlanta_012309.jpg
https://www.pexco.com/assets/images/content/P0000689.jpg
https://www.pexco.com/assets/images/content/street_view_1a.jpg
https://www.pexco.com/assets/images/content/street_view_2a.jpg
https://www.pexco.com/assets/images/content/TUL_5890.jpg
https://www.pexco.com/assets/images/content/Davidson_STOP_4_Ped_Sign_Atlanta_012309.jpg
https://www.pexco.com/assets/images/content/P0000689.jpg
https://www.pexco.com/assets/images/content/street_view_1a.jpg
https://www.pexco.com/assets/images/content/street_view_2a.jpg
https://www.pexco.com/assets/images/content/CADdetails_Microsite_Button.jpg
https://www.pexco.com/assets/images/template/pexco-logo-dark.svg
https://www.pexco.com/assets/images/template/fb-icon.jpg
https://www.pexco.com/assets/images/template/LI-icon.jpg
https://www.pexco.com/assets/images/template/YT-icon.jpg
https://px.ads.linkedin.com/collect/?pid=2856522&fmt=gif
EDIT :
As discussed with OP, she needs a solution that directly returns her the full url. Selenium can be used in this case.
Please try the following code.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
chrome_path = r"C:\Users\hpoddar\Desktop\Tools\chromedriver_win32\chromedriver.exe" # PUT YOUR CHROME PATH HERE
s = Service(chrome_path)
url = 'https://www.pexco.com/traffic/products/pedestrian-safety-products/in-street-pedestrian-crosswalk-signs/'
driver = webdriver.Chrome(service=s)
driver.get(url)
images = driver.find_elements(By.TAG_NAME, 'img')
for image in images:
print(image.get_attribute('src'))
which gives us the expected output :
https://www.pexco.com/assets/images/template/pexco-logo-dark.svg
https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/banners/bg-banner-traffic-desktop.jpg
https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/TUL_5890.jpg
https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/Davidson_STOP_4_Ped_Sign_Atlanta_012309.jpg
https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/P0000689.jpg
https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/street_view_1a.jpg
https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/street_view_2a.jpg
https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/TUL_5890.jpg
https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/Davidson_STOP_4_Ped_Sign_Atlanta_012309.jpg
https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/P0000689.jpg
https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/street_view_1a.jpg
https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/street_view_2a.jpg
https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/content/CADdetails_Microsite_Button.jpg
https://www.pexco.com/assets/images/template/pexco-logo-dark.svg
https://www.gstatic.com/images/branding/googlelogo/1x/googlelogo_color_42x16dp.png
https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/template/fb-icon.jpg
https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/template/LI-icon.jpg
https://marvel-b1-cdn.bc0a.com/f00000000266812/www.pexco.com/assets/images/template/YT-icon.jpg
https://www.gstatic.com/images/branding/product/1x/translate_24dp.png
https://marvel-b1-cdn.bc0a.com/f00000000266812/cdn1.thelivechatsoftware.com/assets/interchanges/pexco.com/resources/pexco_2021-11-18.03-09-45.png