pythonloggingmultiprocessingray

How to Configure Ray to Use Standard Python Logger for Multiprocessing?


I am trying to use Ray to improve the speed of a process and I want the log messages to be passed to the standard Python logger. This way, the application can handle formatting, filtering, and saving the log messages. However, when I use Ray, the log messages are not formatted according to my logger configuration and are not passed back to the root logger. I tried setting log_to_driver=True and configure_logging=True in ray.init() , but it didn't solve the problem. How can I configure Ray to use the standard Python logger for multiprocessing?"

Here is an example that should demonstrate the issue:

from ray.util.multiprocessing import Pool
import pathlib
import logging
import json

def setup_logging(config_file: pathlib.Path):
    with open(config_file) as f_in:
        config = json.load(f_in)
    logging.config.dictConfig(config)

logger = logging.getLogger(__name__)
config_file = pathlib.Path(__file__).parent / "log_setup/config_logging.json"
setup_logging(config_file=config_file)

def f(index):
    logger.warning(f"index: {index}")
    return (index, "model")

if __name__ == "__main__":
    logger.warning("Starting")
    pool = Pool(1)
    results = pool.map(f, range(10))
    print(list(results))

where I have the config of the logger as:

{
    "version": 1,
    "disable_existing_loggers": false,
    "formatters": {
      "detailed": {
        "format": "[%(levelname)s|%(name)s|%(module)s|L%(lineno)d] %(asctime)s: %(message)s",
        "datefmt": "%Y-%m-%dT%H:%M:%S%z"
      }
    },
    "handlers": {
      "stdout": {
        "class": "logging.StreamHandler",
        "level": "INFO",
        "formatter": "detailed"
      }
    },
    "loggers": {
      "root": {
        "level": "DEBUG",
        "handlers": [
          "stdout"
        ]
      }
    }
  }

If I just use the python map, I would get the following printed:

[WARNING|__main__|ray_trial|L28] 2024-03-27T15:14:21+0100: Starting
[WARNING|__main__|ray_trial|L22] 2024-03-27T15:14:21+0100: index: 0
[WARNING|__main__|ray_trial|L22] 2024-03-27T15:14:21+0100: index: 1
[WARNING|__main__|ray_trial|L22] 2024-03-27T15:14:21+0100: index: 2
[WARNING|__main__|ray_trial|L22] 2024-03-27T15:14:21+0100: index: 3
[WARNING|__main__|ray_trial|L22] 2024-03-27T15:14:21+0100: index: 4
[WARNING|__main__|ray_trial|L22] 2024-03-27T15:14:21+0100: index: 5
[WARNING|__main__|ray_trial|L22] 2024-03-27T15:14:21+0100: index: 6
[WARNING|__main__|ray_trial|L22] 2024-03-27T15:14:21+0100: index: 7
[WARNING|__main__|ray_trial|L22] 2024-03-27T15:14:21+0100: index: 8
[WARNING|__main__|ray_trial|L22] 2024-03-27T15:14:21+0100: index: 9

But when I use Ray I get:

2024-03-27 14:54:55,064 INFO worker.py:1743 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
(PoolActor pid=43261) index: 0
(PoolActor pid=43261) index: 1
(PoolActor pid=43261) index: 2
(PoolActor pid=43261) index: 3
(PoolActor pid=43261) index: 4
(PoolActor pid=43261) index: 5
(PoolActor pid=43261) index: 6
(PoolActor pid=43261) index: 7
(PoolActor pid=43261) index: 8
(PoolActor pid=43261) index: 9

Solution

  • As per the Ray documentation, every worker sets up its own logging, and so if you want to change the logging, you need to initialize it for every worker.

    The simplest way to do so is also given in the Ray documentation:

    # driver.py
    def logging_setup_func():
        logger = logging.getLogger("ray")
        logger.setLevel(logging.DEBUG)
        warnings.simplefilter("always")
    
    ray.init(runtime_env={"worker_process_setup_hook": logging_setup_func})
    
    logging_setup_func()
    

    You can then use ray's implementation of Pool as normally.