I'm using Kedro version 0.18.7 and python 3.9 in WSL2.
I'd like to run nodes of my pipeline in parallel by running the command kedro run --pipeline <pipeline_name> --runner ParallelRunner
. According to the documentation ParallelRunner, it should be possible to define the maximum number of CPU cores to use (using max_workers
), but I am struggling to find out how to use this argument. Apparently I cannot just add it to the command like --runner ParallelRunner --max_workers 4
.
Does somebody know how to set max_workers for ParallelRunner?
Previous discussions on max_workers are from older versions of Kedro (for example github issue). I guess I need to create a file somewhere in the project directory and write relevant code, something like runner=ParallelRunner(max_workers=4)
(cli.py? run.py? settings.py?), but other than that I am lost.
Any tips or guidance would be appreciated.
One way that can work is by creating a kedro session to run your pipeline.
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
from kedro.runner import ParallelRunner
from pathlib import Path
bootstrap_project(Path("<project_root>"))
with KedroSession.create() as session:
session.run(pipeline_name=<pipeline-name>, runner=ParallelRunner(max_workers=4))