[SOLVED] Scrapy with Splash still giving DEBUG: Crawled (200)

Scrapy with Splash still giving DEBUG: Crawled (200)

I'm new to scrapy and I can't seem to figure out why I'm having this problem when I run my code. I coded this from a simple tutorial and then added Splash. Splash is up and running.

This is the code:

livros.py

from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from olx.items import OlxItem
from scrapy_splash import SplashRequest

class LivrosSpider(CrawlSpider):
    name = 'livros'
    allowed_domains = ['www.olx.pt']
    start_urls = ['https://www.olx.pt/lazer/livros-revistas/historia/']

    rules = (
        Rule(LinkExtractor(allow=(), restrict_css=('.pageNextPrev',)),
             callback="parse_item",
             follow=True),)

    def parse_item(self, response):
        item_links = response.css('.large > .detailsLink::attr(href)').extract()
        for a in item_links:
            yield SplashRequest(a, callback=self.parse_detail_page)

    def parse_detail_page(self, response):
        title = response.css('h1::text').extract()[0].strip()
        price = response.css('.pricelabel > strong::text').extract()[0]

        item = OlxItem()
        item['title'] = title
        item['price'] = price
        item['url'] = response.url
        yield item

items.py

import scrapy

class OlxItem(scrapy.Item):
    # define the fields for your item here like:
    title = scrapy.Field()
    price = scrapy.Field()
    url = scrapy.Field()

settings.py

BOT_NAME = 'olx'

SPIDER_MODULES = ['olx.spiders']
NEWSPIDER_MODULE = 'olx.spiders'

FEED_URI = 'data/%(name)s/%(time)s.json'
FEED_FORMAT = 'json'

ROBOTSTXT_OBEY = True

#ScrapySplash settings
SPLASH_URL = 'http://localhost:8050'
DOWNLOADER_MIDDLEWARES = { 'scrapy_splash.SplashCookiesMiddleware': 723,
                          'scrapy_splash.SplashMiddleware': 725,
                          'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
SPIDER_MIDDLEWARES = { 'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'

And below is the error that I keep getting on the terminal:

2018-05-15 07:47:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.olx.pt/lazer/livros-revistas/historia/?page=7> (referer: https://www.olx.pt/lazer/livros-revistas/historia/?page=6)
2018-05-15 07:47:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.olx.pt/lazer/livros-revistas/historia/?page=8> (referer: https://www.olx.pt/lazer/livros-revistas/historia/?page=7)
2018-05-15 07:47:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.olx.pt/lazer/livros-revistas/historia/?page=9> (referer: https://www.olx.pt/lazer/livros-revistas/historia/?page=8)
2018-05-15 07:47:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.olx.pt/lazer/livros-revistas/historia/?page=10> (referer: https://www.olx.pt/lazer/livros-revistas/historia/?page=9)
2018-05-15 07:47:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.olx.pt/lazer/livros-revistas/historia/?page=11> (referer: https://www.olx.pt/lazer/livros-revistas/historia/?page=10)
2018-05-15 07:47:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.olx.pt/lazer/livros-revistas/historia/?page=12> (referer: https://www.olx.pt/lazer/livros-revistas/historia/?page=11)

In the end the program was supposed to save the data onto a json file, but the file always come out blank. Can you help me figure out what am I missing?

Solution

Below change works for me .x-large vs .large:

def parse_item(self, response):

    item_links = response.css('.x-large > .detailsLink::attr(href)').extract()
    for a in item_links:
        yield SplashRequest(a, callback=self.parse_detail_page)