pythonparallel-processingluigi

Luigi - parallel branches


i am completly new to Luigi and i already have a problem that i can't seem to fix.

So let's say i want something like this :

enter image description here So basically my question is: how can icall a Task multiple times for as many instances as i want and even keep adding other tasks the same way to each "branch".

I know it should work with the requires() function, e.g. by return [list of Task for x in range(10)] or something like this. But i cant find the right syntax / ways of doing it.

I hope someone can help me, i'd be very grateful !!

Best regards and thanks in advance


Solution

  • The uniqueness of the task is determined by the output path. If the output path is the same, then the task is the same. So what you can do is make multiple instances of the same class with different output paths.

    The recommended way to do this in Luigi is to add a parameter to your task class and use this parameter in the construction of your output path.

    For example,

    class TaskA(luigi.Task):
        number = luigi.IntParameter()
        
        def output():
            base_path = 'path/to/a/dir' 
            file_name = '%5.5d.txt' % self.number
            return LocalTarget(os.path.join(base_path, file_name)) 
    

    Now you can call this task several times in another task by:

    class TaskB(luigi.Task):
        
        def requires():
            return [TaskA(i) for i in range(n)] 
    

    Note, the second time you run TaskB its requirements for TaskA for 1 to n are already satisfied. If you always want to execute TaskA again you should add some randomness to its output path.