In bookspider.py I have:
from typing import Iterable
import scrapy
from scrapy.http import Request
class BookSpider(scrapy.Spider):
name = None
def start_requests(self) -> Iterable[Request]:
yield scrapy.Request("https://books.toscrape.com/")
def parse(self, response):
books = response.css("article.product_pod")
for book in books:
yield {
"name": self.name,
"title": book.css("h3 a::text").get().strip(),
}
In test_bookspider.py I have:
import json
import os
from pytest_twisted import inlineCallbacks
from scrapy.crawler import CrawlerRunner
from twisted.internet import defer
from bookspider import BookSpider
@inlineCallbacks
def test_bookspider():
runner = CrawlerRunner(
settings={
"REQUEST_FINGERPRINTER_IMPLEMENTATION": "2.7",
"FEEDS": {"books.json": {"format": "json"}},
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
# "TWISTED_REACTOR": "twisted.internet.selectreactor.SelectReactor",
}
)
yield runner.crawl(BookSpider, name="books")
with open("books.json", "r") as f:
books = json.load(f)
assert len(books) >= 1
assert books[0]["name"] == "books"
assert books[0]["title"] == "A Light in the ..."
os.remove("books.json")
defer.returnValue(None)
With "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
uncommented I get the following error:
Exception: The installed reactor (twisted.internet.selectreactor.SelectReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor)
With "TWISTED_REACTOR": "twisted.internet.selectreactor.SelectReactor"
uncommented my test passes.
Can anyone explain this behaviour and more broadly how to test CrawlerRunner or CrawlerProcess with pytest?
If you use pytest-twisted
you need to tell it to install an appropriate reactor by passing --reactor=asyncio
to your pytest command, otherwise it will install the default reactor. See https://github.com/pytest-dev/pytest-twisted#using-the-plugin
how to test CrawlerRunner or CrawlerProcess with pytest?
You shouldn't use CrawlerProcess
in things like pytest tests, because it will start and stop the reactor for you. If you really need to test those you should write tests that use a single process per a CrawlerProcess
invocation.