I'm trying to scrape a webpage (https://a-z-animals.com/animals/) to get all the animal names listed there.
I installed scrapy in my PyCharm project. Then, by using the terminal in PyCharm, created a folder using scrapy startproject AnimalNames
. I navigated into that folder and created a spider using scrapy genspider animals https://a-z-animals.com/animals/
Then I added to the code in animals.py
, which is meant to retrieve the animal names from the site:
import scrapy
class AnimalsSpider(scrapy.Spider):
name = "animals"
allowed_domains = ["a-z-animals.com"]
start_urls = ["https://a-z-animals.com/animals/"]
def parse(self, response):
for container in response.css('div.container'):
yield {
container.css('a::text').get()
}
But PyCharm underlines the parse method parameters ((self, response)
) and tells me:
Signature of method
AnimalsSpider.parse()
does not match signature of the base method in classSpider
When I run the spider using scrapy crawl animals -O names_of_animals.json
it just gives me an empty json file.
How do I fix this so it makes me a json file of all the animal names in the site?
Note that I had to change the USER_AGENT and DOWNLOAD_DELAY in settings.py
so the webpage doesn't block me.
A function signature is the specification of function parameter form. You should use the same form when a function is overwritten from the parent class you inherited.
The parse
method is inherited from scrapy.Spider
, which might be defined as
def parse(self, response, **kwargs)
or
def parse(self, response, *args, **kwargs)
,
which depends on the version of scrapy
you are using.
Usually, You can fix this problem by changing
def parse(self, response)
to
def parse(self, response, **kwargs)
.