python-3.xscrapytext-compression

How can i save Scrapy logs in gzip after scrapping without using scripts in bash?


Are there any ways to compress logs? I need to store them for some time for later debugging and it would be cool to find a way to reduce their size. If there is no such a method, then how to organize the compression process more efficiently?


Solution

  • You can compress the logs after the spider has finished running by writing the compression code in the spider closed method. See sample below where I compress the log file and then after compression I delete the initial log file. You can improve the code by adding some error handling.

    import scrapy
    import gzip
    import os
    
    class TestSpider(scrapy.Spider):
        name = 'test'
        allowed_domains = ['toscrape.com']
        start_urls = ['https://books.toscrape.com']
    
        custom_settings = {
            'LOG_FILE': 'scrapy.log'
        }
    
        def parse(self, response):
            yield {
                'url': response.url
            }
    
        def closed(self, reason):
            with open('scrapy.log', 'rb') as f_in, gzip.open('scrapy.log.gz', 'wb') as f_out:
                f_out.writelines(f_in)
            os.remove('scrapy.log')