pythonscrapy

Only make HEAD request when using crawling framework Scrapy


When using the crawling framework Scrapy in Python, I want only to check the HTML response codes of a few thousand domains - and nothing else to do a fast and efficient initial crawling for status code.

How can I only do HEAD Requests instead of the default GET request?


Solution

  • you can use the method option in Request

    def start_requests(self):
        yield scrapy.Request(
            url,
            method="HEAD"
        )