I have a kedro pipeline which generates a file that is used again for the next run of that same pipeline. However, when the pipeline runs for the first time, that file does not exist, and it is handled in a node in the pipeline. Kedro throws an missing file error here at this time.
Is there a way this can be handled through Kedro? Maybe add an catalog parameter missing=True
or optional=True
, and Kedro can safely ignore the file?
How I currently implemented the solution was to create an empty file, and check if the file is an empty dataframe in my node.
I don't think this is possible.
I tried to propose a workaround using hooks to inject a custom MissingDataSet
on the fly, but this workflow didn't work: https://github.com/kedro-org/kedro/issues/2690#issuecomment-1607746840
Apparently DataCatalog
is not a singleton, so this is not straightforward.