pythonweb-scrapingbeautifulsouppython-requestsscrapinghub

How can I scrape the image using Beautiful Soup and python


I am trying to scrape the image link from the below link but I am not able to

Link : https://www.online.citibank.co.in/credit-card/rewards/citi-rewards-credit-card?eOfferCode=INCCCCTWAFCTRELM

I have used the below code

x = ' https://www.online.citibank.co.in/credit-card/rewards/citi-rewards-credit-card?eOfferCode=INCCCCTWAFCTRELM'
html = urlopen(x)
soup = BeautifulSoup(html, 'lxml')
print(soup.find('div', class_ = "m-top-sm block-hero-art-2 display-image"))

Output:

<img _ngcontent-c11="" alt="Citi Logo" class="logo" crossorigin="anonymous" src="https://www.cdn.citibank.com/v1/ingcb/cbol/files/images/logos/logo.png?_bust=2021-01-21T05-05-29-195Z"/>

But this is a wrong link in src that I am getting and it is not the image link.

The highlighted part in the HTML code is where the image link resides. I'd be glad if I get the right code to scrape the image link.

Image to be scraped with the tag

Which tag should be used so that get that exact image link ?

Could any one help me with the alternate code with which I could get the desired result ?


Solution

  • as per @baduker comment card image is added dynamically by JS so bs4 doesn't see this in the source HTML.so you should try selenium with bs4

    from bs4 import BeautifulSoup
    from urllib.request import urlopen
    from selenium import webdriver
    x = ' https://www.online.citibank.co.in/credit-card/rewards/citi-rewards-credit-card?eOfferCode=INCCCCTWAFCTRELM'
    wb = webdriver.Chrome()
    wb.get(x)
    
    soup = BeautifulSoup(wb.page_source, 'lxml')
    print(soup.find('div', class_ = "m-top-sm block-hero-art-2 display-image"))
    print(soup.find('div', class_ = "m-top-sm block-hero-art-2 display-image").find('img').get('src'))
    

    To install selenium, run this in your terminal or follow the above link.

    pip install selenium