pythonscrapypython-asynciotwisted

I'm learning python web scraping . It shows AttributeError when i scrapy crawl a spider


I'm learning python scraping with scrapy. I did exacly the same thing as the tutorial teaches. But I got an error. Please help!

My Python code:

import scrapy


class BookSpider(scrapy.Spider):
    name = "books"
    allowed_domains = ["books.toscrape.com"]
    start_urls = ["https://books.toscrape.com"]

    def parse(self, response):
        books = response.css("article.product_pod")
                             
        for book in books:
            yield{
                "name":book.css("h3 a::text").get(),
                "price":book.css(".product_price .price_color::text").get(),
                "url": book.css("h3 a").attrib["href"],
            }

The terminal shows

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\Administrator\python\venv\bookscraper\Scripts\scrapy.exe\__main__.py", line 7, in <module>
  File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\cmdline.py", line 161, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\cmdline.py", line 114, in _run_print_help
    func(*a, **kw)
  File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\cmdline.py", line 169, in _run_command
    cmd.run(args, opts)
  File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\commands\crawl.py", line 30, in run
    self.crawler_process.start()
  File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\crawler.py", line 390, in start
    install_shutdown_handlers(self._signal_shutdown)
  File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\utils\ossignal.py", line 19, in install_shutdown_handlers    reactor._handleSignals()
    ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'AsyncioSelectorReactor' object has no attribute '_handleSignals'

The ossignal.py file:

import signal

signal_names = {}
for signame in dir(signal):
    if signame.startswith("SIG") and not signame.startswith("SIG_"):
        signum = getattr(signal, signame)
        if isinstance(signum, int):
            signal_names[signum] = signame


def install_shutdown_handlers(function, override_sigint=True):
    """Install the given function as a signal handler for all common shutdown
    signals (such as SIGINT, SIGTERM, etc). If override_sigint is ``False`` the
    SIGINT handler won't be install if there is already a handler in place
    (e.g.  Pdb)
    """
    from twisted.internet import reactor

    reactor._handleSignals()
    signal.signal(signal.SIGTERM, function)
    if signal.getsignal(signal.SIGINT) == signal.default_int_handler or override_sigint:
        signal.signal(signal.SIGINT, function)
    # Catch Ctrl-Break in windows
    if hasattr(signal, "SIGBREAK"):
        signal.signal(signal.SIGBREAK, function)

Solution

  • As pointed out in my comment, the issue you are describing is already being tackled by scrapy here and has to do with one of its dependencies, twisted (a day ago, a new version was released, 23.8.0, which seems to cause the issue).

    Another user fixed the issue by installing a previous version of twisted (see here).

    Basically, he installed the following version of twisted, which fixed his issue.

    pip install Twisted==22.10.0
    

    Until the issue is fixed and a new version is released, I suggest using the previous version.