pythonscrapyscrapy-shell

Flipkart.com image src is changing after scraping


So I was trying to scrape flipcart.com for pure learning purpose & I'm facing a problem,can't understand why. I was trying to scrape image src from this link - https://www.flipkart.com/search?q=sofa & after I found in dev tools that the src was-

<img class="_396cs4 _3exPp9" alt="Muebles Casa Croma Leatherette 3 Seater  Sofa" src="https://rukminim1.flixcart.com/image/612/612/jvtujrk0/sofa-sectional/z/w/h/light-brown-na-colton-letheratte-light-brown-three-seater-sofa-original-imafghzgwdznm33t.jpeg?q=70">

but when I tried to scrape that in scrapy shell I got different -

In [1]: response.xpath('//div[@class="CXW8mj _21_khk"]/img/@src').get()
Out[1]: '//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_fcebae.svg'

Can anyone tell me how to solve this problem, or why the src is changing.


Solution

  • You need to use selenium get the data. The image data is loaded dynamically. Here is use scrapy.Selector and selenium to extract the data.

    from selenium import webdriver
    from scrapy.selector import Selector
    browser = webdriver.Firefox(executable_path='./geckodriver')
    browser.get(url="https://www.flipkart.com/search?q=sofa")
    
    page = browser.page_source
    image_data = Selector(text=page)
    print(image_data.xpath('//div[@class="CXW8mj _21_khk"]/img/@src').get())
    

    output

    https://rukminim1.flixcart.com/image/612/612/jyeq64w0/sofa-sectional/u/k/b/blue-na-56101502sd00927-godrej-interio-blue-original-imaffpsrybgrhvxb.jpeg?q=70
    

    note: You need to install selenium if not installed in your system.