nextflow

How to stitch workflows together from different machines NEXTFLOW


I am looking to take the results of one Nextflow workflow that runs on a GPU cluster, and feed it into another workflow that runs on a different cluster. From a top level perspective, what is the best practices to do so?

Is there a way to change which cluster to run on in mid workflow? If not, how do you switch to another workflow while maintaining it all in the same pipeline?

I have attempted this with changing includeConfig's mid workflow, but that is not high enough priority to change where the next section is being run.


Solution

  • It would be helpful if you would describe in more detail what exactly you are trying to achieve. It is possible to use different executors for different processes, since the executor directive can be set for each process. You can either do this directly or use the label directive like in this example below:

    In this example (nextflow.config-snippet) I assume you have a local cluster (here using slurm) which has no gpu and you use some cloud-service (here aws) where you are using specific gpu-instances. And since you had no internet connection on your slurm-compute-nodes you want to add some process which runs on your headnode (local).

    process {
        executor = 'slurm' // default executor used if no label
        withLabel: localexecution { executor = 'local' }
        withLabel: cluster_gpu { executor = 'aws' }
    }
    

    If you are not using two different clusters and rather are using the same cluster just with different queues or features, you should look at the clusterOptions or containerOptions directives specific for your usecase.

    But I don't think it would be possible to use the executor directive for two different "clusters" of the same type (like different login credentials for the same cloud service), as they might share the same config-variables - as far as I know these cannot be modified during the workflow.

    In such a case you might want to use the local executor and write everything required for the remote-execution in the shell-script part of your process (i.e handle execution by essentially writing a wrapper script)