pythonkedro

Is there a way to have files in the Kedro Catalog, that are missing?


I have a kedro pipeline which generates a file that is used again for the next run of that same pipeline. However, when the pipeline runs for the first time, that file does not exist, and it is handled in a node in the pipeline. Kedro throws an missing file error here at this time. Is there a way this can be handled through Kedro? Maybe add an catalog parameter missing=True or optional=True, and Kedro can safely ignore the file?

How I currently implemented the solution was to create an empty file, and check if the file is an empty dataframe in my node.


Solution

  • I don't think this is possible.

    I tried to propose a workaround using hooks to inject a custom MissingDataSet on the fly, but this workflow didn't work: https://github.com/kedro-org/kedro/issues/2690#issuecomment-1607746840

    Apparently DataCatalog is not a singleton, so this is not straightforward.