pythonluigi

Python Luigi: Efficent way to handle missing dependencies


at the moment I am using the Luigi library to create a Data-Pipeline. At the end of my pipeline I have a Plot function it looks like this:

class PlotAll(luigi.Task):
...
    def requires(self):
        return{
            "tool1" : analyzeTool1Data(...),
            "tool2" : analyzeTool2Sata(...)}
    def run(self):
        data1 = numpy.load(self.input()["tool1"])
        data2 = numpy.load(self.input()["tool2"])
        plot(data1, data2)
        
...

So now sometimes I only have Data from Tool1 or I only have Data from Tool2. Than I would like to plot only the Data from one Tool. Is there an elegant way to tell the function that if one dependency is missing, it should ignore that input and work with the rest?

So far my idea would be, to check first which data is there, and than to create the dependencies dependent of that.


Solution

  • The whole idea of luigi is that it makes sure all the dependencies of a task are ready before it runs it.

    It doesn't have a notion of optional dependency but you could have dynamic dependencies

    Another option for you might be to define the task analyzeTool1Data in such a way that it still generates an output even if it fails or the data is missing, so luigi still thinks they have run.