command-linescrapyscrapy-shell

how to set scrapy shell's default user agent


I set the default user-agent in settings.py, but I still had to go to the trouble of adding the -s option and the corresponding value to set the user_agent every time I used the scrapy shell.

I know I can use commands like alias scrapys="scrapy shell -s USER_AGENT='xxxxx'" to do it, but is there any better way to implement it?


Solution

  • Solution 1

    Setting USER_AGENT in settings.py should suffice your need. If you have problem with this way, please provide more info (like print you project structure with tree command.).

    To make settings.py being read by scrapy shell ... command, make sure

    1. You're running the command in the project root, where you can see a scrapy.cfg file.

    2. settings.py module path is defined in the scrapy.cfg.

      [settings]
      default = project_name.settings
      

      project_name.settings is the module path to settings.py.

    Solution 2

    Use spider class attribute Spider.custom_settings.

    class MySpider(scrapy.Spider):
        name = 'myspider'
    
        custom_settings = {
            'USER_AGENT': 'some value',
        }
    

    This spider specific setting dict .custom_settings overrule values the global settings.py.

    Ref