scrapyscrapy-splashsplash-js-render

scrapy splash take screen shot of entire page


How can I modify the args so it captures the entire page

def start_requests(self):
    url =#some url
    splash_args = {
        'html': 1,
        'png': 1,
        'width': 600,
    }
    yield SplashRequest(url=url, callback=self.parse,
                        endpoint="render.json",
                        args=splash_args)
    def parse(self, response):
        imgdata = base64.b64decode(response.data['png'])
        filename = 'image.png'
        with open(filename, 'wb') as f:
            f.write(imgdata)

I tried adding 'height' in splash_args the image does get width*height but the extra height is blank, is there any way to solve this?


Solution

  • You can capture the entire page by adding following line to your Lua script

    splash:set_viewport_full()
    

    UPDATE

    It should be put on the end of the script before returning the HTML. It only affects the latest view