dynamicdirected-acyclic-graphsargo-workflows

Argo Workflow to continue processing during fan-out


General question here, wondering if anyone has any ideas or experience trying to achieve something I am right now. I'm not entirely sure if its even possible in the argo workflow system...

I'm wondering if it is possible to continue a workflow regardless if a dynamic fanout has finished. By dynamic fanout I mean that B1/B2/B3 can go to B30 potentially.

I want to see if C1 can start when B1 has finished. The B stage is creating a small file which then in C stage I need to run an api request that it has finished and upload said file. But in this scenario B2/B3 still are processing.

And finally, D1 would have to wait for all of C1/2/3-C# to finish to complete

Diagram what I'm trying to achieve

#           *
#           | 
#          A1 (generates a dynamic list that can change depending on the inputs) 
#           | 
#        /  |  \ 
#       B1  B2  B3 +++ B#
#       |   |    |
#       C1         +++ C#
#       *   *   *
#        \  |  /
#         \ | /
#           D1

I was viewing https://github.com/argoproj/argo-workflows/blob/master/docs/enhanced-depends-logic.md but I cant wrap my head around if this is what I need to achieve this. Especially if the fan-out steps are dynamic.

Seems to me that it would bind C stage to the entirety of B stage and require for for B to finish


Solution

  • Something like this should work:

    apiVersion: argoproj.io/v1alpha1
    kind: Workflow
    spec:
      templates:
        - name: main
          steps:
            - - name: A
                template: A
            - - name: B_C
                template: B_C
                arguments:
                  parameters:
                    - name: item
                      value: "{{item}}"
                withParam: "{{steps.A.outputs.parameters.items}}"
            - - name: D
                template: D
        - name: A
          # container or script spec here
          outputs:
            parameters:
              - name: items
                valueFrom:
                  path: /tmp/items.json
        - name: B_C
          inputs:
            parameters:
              - name: item
          steps:
            - - name: B
                template: B
                arguments:
                  parameters:
                    - name: item
                      value: "{{inputs.parameters.item}}"
            - - name: C
                template: C
                arguments:
                  artifacts:
                    - name: file
                      from: "{{steps.B.outputs.artifacts.file}}"
        - name: B
          inputs:
            parameters:
              - name: item
          # container or script spec here
          outputs:
            artifacts:
              - name: file
                path: /tmp/file
        - name: C
          inputs:
            artifacts:
              - name: file
          # container or script spec here
        - name: D
          # container or script spec here
    

    Step B_C in the main template runs instances of the B_C template in parallel.

    Template B_C runs B and C in series. Once template B_C starts, it runs as quickly as possible, completely unaware of any concurrent executions of the B_C template. So C1 blocks only on B1, never on B2 or B3 or any other B#.

    Once all instances of B_C are finished, the main template finally invokes the D template.