I'm trying to use Scrapy to download my Quora answers, but I can't even seem to be able to download my page. Using the simple
scrapy shell 'http://it.quora.com/profile/Ferdinando-Randisi'
returns this error
2017-10-05 22:16:52 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: quora)
2017-10-05 22:16:52 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'quora.spiders', 'ROBOTSTXT_OBEY': True, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter', 'SPIDER_MODULES': \[quora.spiders'], 'BOT_NAME': 'quora', 'LOGSTATS_INTERVAL': 0}
....
2017-10-05 22:16:53 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-10-05 22:16:53 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-10-05 22:16:53 [scrapy.core.engine] INFO: Spider opened
2017-10-05 22:16:54 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://it.quora.com/robots.txt> from <GET http://it.quora.com/robots.txt>
2017-10-05 22:16:55 [scrapy.core.engine] DEBUG: Crawled (429) <GET https://it.quora.com/robots.txt> (referer: None)
2017-10-05 22:16:55 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://it.quora.com/profile/Ferdinando-Randisi> from <GET http://it.quora.com/profile/Ferdinando-Randisi>
2017-10-05 22:16:56 [scrapy.core.engine] DEBUG: Crawled (429) <GET https://it.quora.com/profile/Ferdinando-Randisi> (referer: None)
2017-10-05 22:16:58 [root] DEBUG: Using default logger
What's wrong? Error 429 is associated with too many requests, but I'm making only one request. Why would that be too many?
It blocks Scrapy based on user agent string. Try to mimic e.g. Chromium:
scrapy shell "http://it.quora.com/profile/Ferdinando-Randisi" -s USER_AGENT="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.52 Safari/537.36"