So I was trying to scrape flipcart.com for pure learning purpose & I'm facing a problem,can't understand why. I was trying to scrape image src from this link - https://www.flipkart.com/search?q=sofa & after I found in dev tools that the src was-
<img class="_396cs4 _3exPp9" alt="Muebles Casa Croma Leatherette 3 Seater Sofa" src="https://rukminim1.flixcart.com/image/612/612/jvtujrk0/sofa-sectional/z/w/h/light-brown-na-colton-letheratte-light-brown-three-seater-sofa-original-imafghzgwdznm33t.jpeg?q=70">
but when I tried to scrape that in scrapy shell I got different -
In [1]: response.xpath('//div[@class="CXW8mj _21_khk"]/img/@src').get()
Out[1]: '//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_fcebae.svg'
Can anyone tell me how to solve this problem, or why the src is changing.
You need to use selenium get the data. The image data is loaded dynamically. Here is use scrapy.Selector and selenium to extract the data.
from selenium import webdriver
from scrapy.selector import Selector
browser = webdriver.Firefox(executable_path='./geckodriver')
browser.get(url="https://www.flipkart.com/search?q=sofa")
page = browser.page_source
image_data = Selector(text=page)
print(image_data.xpath('//div[@class="CXW8mj _21_khk"]/img/@src').get())
output
https://rukminim1.flixcart.com/image/612/612/jyeq64w0/sofa-sectional/u/k/b/blue-na-56101502sd00927-godrej-interio-blue-original-imaffpsrybgrhvxb.jpeg?q=70
note: You need to install selenium if not installed in your system.