pythonargumentsworkflowluigi

Luigi workflow engine command line parameters not being passed to workers in macOS


I am experiencing some problems when trying to pass global parameters to multiple workers. When I run a workflow using multiple workers, the command line parameter values are not being passed. I am wondering if this might be similar to the problem observed when luigi is run on Windows (#2247) but, in my case, on macOS. An additional issue is a difference in log formatting, it seems that logging.cfg is not passed to the workers. I opened a ticket in github but I got no response (#3236).

Next toy example summarizes my problem.


import luigi
import logging

logger = logging.getLogger('luigi-interface')

class HelloConfig(luigi.Config):
    reference = luigi.Parameter(default="World")

class HelloTask(luigi.Task):
    def run(self):
        logger.info("Hello %s!", HelloConfig().reference)

    def requires(self):
        return []

I run the task calling

luigi --module weekly_update.etl.load_ex HelloTask \
      --workers ? \
      --HelloConfig-reference "Mars"

When workers is 1 the log shows

2023-04-21 10:59:09,386 - luigi-interface - INFO - [MainThread] - Hello Mars!

but when workers is 2

2023-04-21 10:59:53,030 [INFO]-load_ex.run: Hello World!

Solution

  • As of Python 3.8, MacOS now defaults to using spawn instead of fork, thus having issues that we previously only saw on Windows. You can change the start method using

    import multiprocessing
    
    multiprocessing.set_start_method('fork')
    

    I'm not sure there is a better solution, and we're also struggling with command line parameters across multiple workers when not on Linux.