scrapyscrapyd

How to correctly configure CONCURRENT_REQUESTS in a project with multiple spiders


I have a project in Scrapy with ~10 spiders, I run a few of them simultaneously using Scrapyd. However, I have doubts whether my CONCURRENT_REQUESTS setting is correct.

Currently my CONCURRENT_REQUESTS is 32, but I have seen that they recommend that this value be much higher (>= 100). But I have a question, is it the total number of concurrent requests that all the running spiders can make or is it the number of concurrent requests that a single spider can make?

I'm assuming it's the number of concurrent requests that all spiders can make and that's why they recommend that it be as high as possible. And I see that I can regulate the number of requests each spider will make using CONCURRENT_REQUESTS_PER_DOMAIN.


Solution

  • Scrapyd can manage multiple projects, each of which contains multiple spiders. CONCURRENT_REQUESTS operates per-project (i.e. for all spiders in that project).

    Reference: issue #463