
scrapy splash gets part of data

I ´m getting this error when I run my scraper :

2022-09-19 23:17:00 [scrapy.core.scraper] ERROR: Spider error processing <GET> (referer:
Traceback (most recent call last):
  File "C:\Users\User\Desktop\Personal\DABRA\Scraper_jfs\venv\lib\site-packages\scrapy\utils\", line 120, in iter_errback
    yield next(it)
  File "C:\Users\User\Desktop\Personal\DABRA\Scraper_jfs\venv\lib\site-packages\scrapy\utils\", line 353, in __next__
    return next(
  File "C:\Users\User\Desktop\Personal\DABRA\Scraper_jfs\venv\lib\site-packages\scrapy\utils\", line 353, in __next__
    return next(
  File "C:\Users\User\Desktop\Personal\DABRA\Scraper_jfs\venv\lib\site-packages\scrapy\core\", line 56, in _evaluate_iterable
    for r in iterable:
  File "C:\Users\User\Desktop\Personal\DABRA\Scraper_jfs\venv\lib\site-packages\scrapy\spidermiddlewares\", line 29, in process_spider_output
    for x in result:
  File "C:\Users\User\Desktop\Personal\DABRA\Scraper_jfs\venv\lib\site-packages\scrapy\core\", line 56, in _evaluate_iterable
    for r in iterable:
  File "C:\Users\User\Desktop\Personal\DABRA\Scraper_jfs\venv\lib\site-packages\scrapy\spidermiddlewares\", line 342, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "C:\Users\User\Desktop\Personal\DABRA\Scraper_jfs\venv\lib\site-packages\scrapy\core\", line 56, in _evaluate_iterable
    for r in iterable:
  File "C:\Users\User\Desktop\Personal\DABRA\Scraper_jfs\venv\lib\site-packages\scrapy\spidermiddlewares\", line 40, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "C:\Users\User\Desktop\Personal\DABRA\Scraper_jfs\venv\lib\site-packages\scrapy\core\", line 56, in _evaluate_iterable
    for r in iterable:
  File "C:\Users\User\Desktop\Personal\DABRA\Scraper_jfs\venv\lib\site-packages\scrapy\spidermiddlewares\", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "C:\Users\User\Desktop\Personal\DABRA\Scraper_jfs\venv\lib\site-packages\scrapy\core\", line 56, in _evaluate_iterable
    for r in iterable:
  File "c:\Users\User\Desktop\Personal\DABRA\Scraper_jfs\just_for_sport\just_for_sport\spiders\", line 41, in parse_article_detail
  File "C:\Users\User\Desktop\Personal\DABRA\Scraper_jfs\venv\lib\site-packages\parsel\", line 70, in __getitem__
    o = super(SelectorList, self).__getitem__(pos)
IndexError: list index out of range

I try to understand what does it mean, but I can´t find the problem. The link works fine...but data is not collected...

My script looks like this:

import scrapy
from scrapy_splash import SplashRequest
from concurrent.futures import process
from scrapy.crawler import CrawlerProcess
from datetime import datetime
import os

if os.path.exists('jfs_mujer.csv'):
    print("The file has been deleted successfully")
    print("The file does not exist!")

class JfsSpider_mujer(scrapy.Spider):
    name = 'jfs_mujer'
    start_urls = [""]            

    def parse(self,response):
       # total_products=int(int(response.css(' span::text').get())/32) + 2
        for count in range(1, 40):
            yield SplashRequest(url=f'{count}',
                          callback=self.parse_links, meta= {'splash': {'endpoint': 'execute', 'args': {'wait': 0.5}}})

   #Extrae links de cada pagina de la seccion
    def parse_links(self,response):
        for link in links:
            yield SplashRequest(response.urljoin('' + link), self.parse_article_detail ,meta= {'splash': {'endpoint': 'execute', 'args': {'wait': 0.5}}})
    def parse_article_detail(self, response):
        yield {
            'Sku' :response.css('span.vtex-product-identifier-0-x-product-identifier__value::text').get(),
            'Name':response.css('span.vtex-store-components-3-x-productBrand::text').get() ,

process= CrawlerProcess(
    settings = { 
        'FEED_URI':'jfs_mujer.csv' ,
        'FEED_FORMAT': 'csv',
        'USER_AGENT' : 'Googlebot/2.1 (+'
        } )        

What's wrong wit the script? or it's something about settings? . I think it has something to do with the way I join the prices, but from 770 products, it works fine for almost 660...I don´t understand... thanks for touyr help!


  • Your error message means that your CSS selector doesn't find anything. You can try above XPath to get the price:

    price = response.xpath('//meta[@property="product:price:amount"]/@content').get()