pythonscrapypytestpython-asynciotwisted

Getting scrapy and pytest to work with AsyncioSelectorReactor


To reproduce my issue

In bookspider.py I have:

from typing import Iterable

import scrapy
from scrapy.http import Request


class BookSpider(scrapy.Spider):
    name = None

    def start_requests(self) -> Iterable[Request]:
        yield scrapy.Request("https://books.toscrape.com/")

    def parse(self, response):
        books = response.css("article.product_pod")
        for book in books:
            yield {
                "name": self.name,
                "title": book.css("h3 a::text").get().strip(),
            }

In test_bookspider.py I have:

import json
import os

from pytest_twisted import inlineCallbacks
from scrapy.crawler import CrawlerRunner
from twisted.internet import defer

from bookspider import BookSpider


@inlineCallbacks
def test_bookspider():
    runner = CrawlerRunner(
        settings={
            "REQUEST_FINGERPRINTER_IMPLEMENTATION": "2.7",
            "FEEDS": {"books.json": {"format": "json"}},
            "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
            # "TWISTED_REACTOR": "twisted.internet.selectreactor.SelectReactor",
        }
    )
    yield runner.crawl(BookSpider, name="books")

    with open("books.json", "r") as f:
        books = json.load(f)
    assert len(books) >= 1
    assert books[0]["name"] == "books"
    assert books[0]["title"] == "A Light in the ..."

    os.remove("books.json")

    defer.returnValue(None)

With "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor" uncommented I get the following error:

Exception: The installed reactor (twisted.internet.selectreactor.SelectReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor)

With "TWISTED_REACTOR": "twisted.internet.selectreactor.SelectReactor" uncommented my test passes.

Can anyone explain this behaviour and more broadly how to test CrawlerRunner or CrawlerProcess with pytest?


Solution

  • If you use pytest-twisted you need to tell it to install an appropriate reactor by passing --reactor=asyncio to your pytest command, otherwise it will install the default reactor. See https://github.com/pytest-dev/pytest-twisted#using-the-plugin

    how to test CrawlerRunner or CrawlerProcess with pytest?

    You shouldn't use CrawlerProcess in things like pytest tests, because it will start and stop the reactor for you. If you really need to test those you should write tests that use a single process per a CrawlerProcess invocation.