pythonloggingcolorsscrapy

Scrapy framework - Colorize logging


I am trying to make Scrapy output colorized logs. I am not so familiar with Python logging, but my understanding is that I must make my own Formatter and make it use by Scrapy. I succeeded in making a Formatter to colorized the output using Clint.

My problem is that I can't make it work within Scrapy correctly. I would have expected the logger object in my spider to have a handler, then I would have switched the formatter of that handler. When I looks what is inside spider.logger.logger, I see that handler is an empty list. I tried to add my formatter in a new stream handler doing.

crawler.spider.logger.logger.addHandler(sh) where sh is a handler using my color formatter.

This have for effect to make scrappy output each messages twice. First message is colorized but doesn't have Scrapy formatting. The second one has Scrapy formatting with no colors.

How can I make Scrapy output colorized logs keeping the same format that can be set in settings.py

Thanks


Solution

  • If you mean to colorize LogRecord only, you can customize LOG_FORMAT in settings.py with ANSI escape codes.

    Example:

    LOG_FORMAT = '\x1b[0;0;34m%(asctime)s\x1b[0;0m \x1b[0;0;36m[%(name)s]\x1b[0;0m \x1b[0;0;31m%(levelname)s\x1b[0;0m: %(message)s'
    

    If you also want to colorize different log levels with different colors, you can override scrapy.utils.log._get_handler(source code).

    Put this near the top of your settings.py

    import scrapy.utils.log
    
    _get_handler = copy.copy(scrapy.utils.log._get_handler)
    
    
    def _get_handler_custom(*args, **kwargs):
        handler = _get_handler(*args, **kwargs)
        handler.setFormatter(your_custom_formatter)
        return handler
    
    scrapy.utils.log._get_handler = _get_handler_custom
    

    What it does is reset the formatter after calling the original _get_handler, and then reattach it to scrapy.utils.log. This is a hacky solution and might not be the best practice, but it just works.

    A more proper way to achieve this is to override logging.StreamHandler. There is a bunch of discussion on SO which can lead you to the right direction.

    Here I provide my full working codes used in my projects (a third-party package colorlog is in use).

    settings.py

    import copy
    
    from colorlog import ColoredFormatter
    import scrapy.utils.log
    
    color_formatter = ColoredFormatter(
        (
            '%(log_color)s%(levelname)-5s%(reset)s '
            '%(yellow)s[%(asctime)s]%(reset)s'
            '%(white)s %(name)s %(funcName)s %(bold_purple)s:%(lineno)d%(reset)s '
            '%(log_color)s%(message)s%(reset)s'
        ),
        datefmt='%y-%m-%d %H:%M:%S',
        log_colors={
            'DEBUG': 'blue',
            'INFO': 'bold_cyan',
            'WARNING': 'red',
            'ERROR': 'bg_bold_red',
            'CRITICAL': 'red,bg_white',
        }
    )
    
    _get_handler = copy.copy(scrapy.utils.log._get_handler)
    
    def _get_handler_custom(*args, **kwargs):
        handler = _get_handler(*args, **kwargs)
        handler.setFormatter(color_formatter)
        return handler
    
    scrapy.utils.log._get_handler = _get_handler_custom