nextflow

Publishing files to an out directory in Nextflow


I am working on an Nextflow pipelines that has several processes controlled by the main pipeline. One of my processes generates multiple or at least one folder which names are project names. I publish these folders into a pipeline's output folder. Next process generates come csv files which have prefixes the same as folder names generated in the previous steps of the pipeline. I want to publish these files inside the corresponding directories.

process COUNT{

input:
path stats_file

output:
stdout emit: result
path "*counts.csv", type: "file", emit counts

script:
bash call of a python script that takes one stats file, splits it into multiple *counts.csv files and outputs them in a format <project_name>_counts.csv
touch project_1_counts.csv
touch project_2_counts.csv
}

I have, in the out directory: out/pipeline two folders created by the previous step: out/pipeline/project_1 and out/pipeline/project_2. I want project_1_counts.csv to be saved in project_1 and project_2_counts.csv to be saved in project 2 direcrtory.

I know that I have to write a publishDir directive before input/output sections, but I don't know how to write one that will be matching the script output files and the corresponding folders.


Solution

  • You can use the saveAs option with the publishDir directive for this:

    saveAs

    A closure which, given the name of the file being published, returns the actual file name or a full path where the file is required to be stored. This can be used to rename or change the destination directory of the published files dynamically by using a custom strategy. Return the value null from the closure to not publish a file. This is useful when the process has multiple output files, but you want to publish only some of them.

    For example:

    params.outdir = './out'
    
    
    process COUNT {
    
        publishDir (
             path: "${params.outdir}/pipeline",
             mode: 'copy',
             saveAs: { filename ->
                 "${filename.replaceFirst('_counts.csv', '')}/${filename}"
             }
        )
    
        output:
        path "*_counts.csv", emit: counts
    
        script:
        """
        touch project_{1,2,3}_counts.csv
        """
    }
    
    workflow {
    
        COUNT()
    }
    

    Results:

    $ nextflow run main.nf 
    
     N E X T F L O W   ~  version 24.04.4
    
    Launching `main.nf` [golden_booth] DSL2 - revision: 1be20c05ad
    
    executor >  local (1)
    [cd/5f5f96] process > COUNT [100%] 1 of 1 ✔
    
    $ find out/ -type f
    out/pipeline/project_3/project_3_counts.csv
    out/pipeline/project_1/project_1_counts.csv
    out/pipeline/project_2/project_2_counts.csv
    

    Note that you can also set the publishDir directive in a configuration file, for example:

    process {
    
        withName: 'COUNT' {
    
            publishDir = [
                path: { "${params.outdir}/pipeline" },
                mode: 'copy',
                saveAs: { filename ->
                    "${filename.replaceFirst('_counts.csv', '')}/${filename}"
                }
            ]
        }
    }