I'm learning python scraping with scrapy. I did exacly the same thing as the tutorial teaches. But I got an error. Please help!
My Python code:
import scrapy
class BookSpider(scrapy.Spider):
name = "books"
allowed_domains = ["books.toscrape.com"]
start_urls = ["https://books.toscrape.com"]
def parse(self, response):
books = response.css("article.product_pod")
for book in books:
yield{
"name":book.css("h3 a::text").get(),
"price":book.css(".product_price .price_color::text").get(),
"url": book.css("h3 a").attrib["href"],
}
The terminal shows
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\Administrator\python\venv\bookscraper\Scripts\scrapy.exe\__main__.py", line 7, in <module>
File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\cmdline.py", line 161, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\cmdline.py", line 114, in _run_print_help
func(*a, **kw)
File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\cmdline.py", line 169, in _run_command
cmd.run(args, opts)
File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\commands\crawl.py", line 30, in run
self.crawler_process.start()
File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\crawler.py", line 390, in start
install_shutdown_handlers(self._signal_shutdown)
File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\utils\ossignal.py", line 19, in install_shutdown_handlers reactor._handleSignals()
^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'AsyncioSelectorReactor' object has no attribute '_handleSignals'
The ossignal.py file:
import signal
signal_names = {}
for signame in dir(signal):
if signame.startswith("SIG") and not signame.startswith("SIG_"):
signum = getattr(signal, signame)
if isinstance(signum, int):
signal_names[signum] = signame
def install_shutdown_handlers(function, override_sigint=True):
"""Install the given function as a signal handler for all common shutdown
signals (such as SIGINT, SIGTERM, etc). If override_sigint is ``False`` the
SIGINT handler won't be install if there is already a handler in place
(e.g. Pdb)
"""
from twisted.internet import reactor
reactor._handleSignals()
signal.signal(signal.SIGTERM, function)
if signal.getsignal(signal.SIGINT) == signal.default_int_handler or override_sigint:
signal.signal(signal.SIGINT, function)
# Catch Ctrl-Break in windows
if hasattr(signal, "SIGBREAK"):
signal.signal(signal.SIGBREAK, function)
As pointed out in my comment, the issue you are describing is already being tackled by scrapy here and has to do with one of its dependencies, twisted (a day ago, a new version was released, 23.8.0
, which seems to cause the issue).
Another user fixed the issue by installing a previous version of twisted (see here).
Basically, he installed the following version of twisted, which fixed his issue.
pip install Twisted==22.10.0
Until the issue is fixed and a new version is released, I suggest using the previous version.