pythonweb-scrapingscrapy

`scrapy bench` errors with `AssertionError` on execution


I ran this command to install conda install -c conda-forge scrapy pylint autopep8 -y

then I ran scrapy bench to get the below error. The same thing is happening on global installation via pip command. Please help as I can't understand the reason for this error

scrapy bench
2025-01-25 13:52:30 [scrapy.utils.log] INFO: Scrapy 2.12.0 started (bot: scrapybot)
2025-01-25 13:52:30 [scrapy.utils.log] INFO: Versions: lxml 5.3.0.0, libxml2 2.13.5, cssselect 1.2.0, parsel 1.10.0, w3lib 2.2.1, Twisted 24.11.0, Python 3.12.8 | packaged by conda-forge | (main, Dec  5 2024, 14:06:27) [MSC v.1942 64 bit (AMD64)], pyOpenSSL 25.0.0 (OpenSSL 3.4.0 22 Oct 2024), cryptography 44.0.0, Platform Windows-11-10.0.26100-SP0
2025-01-25 13:52:31 [scrapy.addons] INFO: Enabled addons:
[]
2025-01-25 13:52:31 [scrapy.extensions.telnet] INFO: Telnet Password: 1d038a25605956ac
2025-01-25 13:52:31 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.closespider.CloseSpider',
 'scrapy.extensions.logstats.LogStats']
2025-01-25 13:52:31 [scrapy.crawler] INFO: Overridden settings:
{'CLOSESPIDER_TIMEOUT': 10, 'LOGSTATS_INTERVAL': 1, 'LOG_LEVEL': 'INFO'}
2025-01-25 13:52:32 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2025-01-25 13:52:32 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2025-01-25 13:52:32 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2025-01-25 13:52:32 [scrapy.core.engine] INFO: Spider opened
2025-01-25 13:52:32 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2025-01-25 13:52:32 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2025-01-25 13:52:32 [scrapy.core.scraper] ERROR: Spider error processing <GET http://localhost:8998?total=100000&show=20> (referer: None)
Traceback (most recent call last):
  File "C:\Users\Risha\anaconda3\envs\scrapy\Lib\site-packages\scrapy\utils\defer.py", line 327, in iter_errback
    yield next(it)
          ^^^^^^^^
  File "C:\Users\Risha\anaconda3\envs\scrapy\Lib\site-packages\scrapy\utils\python.py", line 368, in __next__
    return next(self.data)
           ^^^^^^^^^^^^^^^
  File "C:\Users\Risha\anaconda3\envs\scrapy\Lib\site-packages\scrapy\utils\python.py", line 368, in __next__
    return next(self.data)
           ^^^^^^^^^^^^^^^
  File "C:\Users\Risha\anaconda3\envs\scrapy\Lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync
    yield from iterable
  File "C:\Users\Risha\anaconda3\envs\scrapy\Lib\site-packages\scrapy\spidermiddlewares\referer.py", line 379, in <genexpr>
    return (self._set_referer(r, response) for r in result)
                                                    ^^^^^^
  File "C:\Users\Risha\anaconda3\envs\scrapy\Lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync
    yield from iterable
  File "C:\Users\Risha\anaconda3\envs\scrapy\Lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 57, in <genexpr>
    return (r for r in result if self._filter(r, spider))
                       ^^^^^^
  File "C:\Users\Risha\anaconda3\envs\scrapy\Lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync
    yield from iterable
  File "C:\Users\Risha\anaconda3\envs\scrapy\Lib\site-packages\scrapy\spidermiddlewares\depth.py", line 54, in <genexpr>
    return (r for r in result if self._filter(r, response, spider))
                       ^^^^^^
  File "C:\Users\Risha\anaconda3\envs\scrapy\Lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync
    yield from iterable
  File "C:\Users\Risha\anaconda3\envs\scrapy\Lib\site-packages\scrapy\commands\bench.py", line 70, in parse
    assert isinstance(Response, TextResponse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
2025-01-25 13:52:32 [scrapy.core.engine] INFO: Closing spider (finished)
2025-01-25 13:52:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 241,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 1484,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'elapsed_time_seconds': 0.140934,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2025, 1, 25, 8, 22, 32, 389327, tzinfo=datetime.timezone.utc),
 'items_per_minute': None,
 'log_count/ERROR': 1,
 'log_count/INFO': 10,
 'response_received_count': 1,
 'responses_per_minute': None,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'spider_exceptions/AssertionError': 1,
 'start_time': datetime.datetime(2025, 1, 25, 8, 22, 32, 248393, tzinfo=datetime.timezone.utc)}
2025-01-25 13:52:32 [scrapy.core.engine] INFO: Spider closed (finished)

Solution

  • This is a bug on Scrapy introduced on 2.12.0.

    It's passing the wrong param to isinstance(). This function expects the first param to be the object to be verified (see the docs), but it's currently passing Response class, which leads to the AssertionError we can see in your logs:

      File "C:\Users\Risha\anaconda3\envs\scrapy\Lib\site-packages\scrapy\commands\bench.py", line 70, in parse
        assert isinstance(Response, TextResponse)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    AssertionError
    

    I submitted a PR with a fix here replacing the Response class passed as param with the response object. The PR was merged, but a new version wasn't yet released.

    Therefore, to move forward, you can choose one of the options below:

    a) Clone the Scrapy repository and install it based on the latest master

    b) Downgrade your scrapy version to 2.11.2

    c) Wait until Scrapy officially releases the fix (likely on 2.13 version)