scrapyweb-crawlerscrapy-splashsplash-js-render

How to set cookies in Scrapy+Splash when javascript makes multiple requests?


When the javascript is loaded, it makes a another ajax request where cookies should be set in the response. However, Splash does not keep any cookies across multiple requests, is there a way to keep the cookies across all requests? Or even assign them manually between each requests.


Solution

  • Yes, there is an example in scrapy-splash README - see Session Handling section. In short, first, make sure that all settings are correct. Then use SplashRequest(url, endpoint='execute', args={'lua_source': script}) to send scrapy requests. Rendering script should be like this:

    function main(splash)
        splash:init_cookies(splash.args.cookies)
    
        -- ... your script
    
        return {
            cookies = splash:get_cookies(),
            -- ... other results, e.g. html
        }
    end
    

    There is also a complete example with cookie handling, header handling, etc. in scrapy-splash README - see a last example here.