kubeflowkubeflow-pipelines

kubeflow OutputPath/InputPath question when writing/reading multiple files


I've a data-fetch stage where I get multiple DFs and serialize those. I'm currently treating OutputPath as directory - create it if it doesn't exist etc. and then serialize all the DFs in that path with different names for each DF.

In a subsequent pipeline stage (say, predict) I need to retrieve all those through InputPath.

Now, from the documentation it seems InputPath/OutputPath as file. Does kubeflow as any limitation if I use it as directory?


Solution

  • The ComponentSpec's {inputPath: input_name} and {outputPath: output_name} placeholders and their Python analogs (input_name: InputPath()/output_name: OutputPath()) are designed to support both files/blobs and directories.

    They are expected to provide the path for the input/output data. No matter whether the data is a blob/file or a directory.

    The only limitation is that UX might not be able to preview such artifacts. But the pipeline itself would work.