argo-workflowsargoproj

Iterate over all files in S3 folder with Argo Workflows


In Argo, I sometimes want to pass each item contained in an S3 folder to a template, using the withSequence: field of the Workflow Step. The best idea I have is to have a step with Python which lists the whole folder using a similar process I use with CSVs and transforming it into a list of JSON objects. Is there any built-in way of accomplishing this?


Solution

  • Global workflow input parameters are meant to be user input. There are currently no storage-specific automated tools to populate global input parameters.

    You have a couple of options: 1) generate the list of keys inside the Workflow and pass them as parameters to an individual step or 2) generate the list using an external program and pass them in as global parameters to the Workflow.

    For the first option, you could create a step that uses an S3 client to write the keys to a JSON array on disk. Then you could use Argo to pull that file into a step output parameter. Finally, a subsequent step could use withItems to loop over the keys.

    For the second option, you could use something on your local machine (BASH script, Python script, etc.) to generate a JSON array of S3 keys and pass them (via whatever mechanism you use to submit Workflows) as a global param to the Workflow. Then you'd loop over the param using withItems just as in the previous approach.