I am working on an Nextflow pipelines that has several processes controlled by the main pipeline. One of my processes generates multiple or at least one folder which names are project names. I publish these folders into a pipeline's output folder. Next process generates come csv files which have prefixes the same as folder names generated in the previous steps of the pipeline. I want to publish these files inside the corresponding directories.
process COUNT{
input:
path stats_file
output:
stdout emit: result
path "*counts.csv", type: "file", emit counts
script:
bash call of a python script that takes one stats file, splits it into multiple *counts.csv files and outputs them in a format <project_name>_counts.csv
touch project_1_counts.csv
touch project_2_counts.csv
}
I have, in the out directory: out/pipeline two folders created by the previous step: out/pipeline/project_1 and out/pipeline/project_2. I want project_1_counts.csv to be saved in project_1 and project_2_counts.csv to be saved in project 2 direcrtory.
I know that I have to write a publishDir directive before input/output sections, but I don't know how to write one that will be matching the script output files and the corresponding folders.
You can use the saveAs
option with the publishDir directive for this:
saveAs
A closure which, given the name of the file being published, returns the actual file name or a full path where the file is required to be stored. This can be used to rename or change the destination directory of the published files dynamically by using a custom strategy. Return the value
null
from the closure to not publish a file. This is useful when the process has multiple output files, but you want to publish only some of them.
For example:
params.outdir = './out'
process COUNT {
publishDir (
path: "${params.outdir}/pipeline",
mode: 'copy',
saveAs: { filename ->
"${filename.replaceFirst('_counts.csv', '')}/${filename}"
}
)
output:
path "*_counts.csv", emit: counts
script:
"""
touch project_{1,2,3}_counts.csv
"""
}
workflow {
COUNT()
}
Results:
$ nextflow run main.nf
N E X T F L O W ~ version 24.04.4
Launching `main.nf` [golden_booth] DSL2 - revision: 1be20c05ad
executor > local (1)
[cd/5f5f96] process > COUNT [100%] 1 of 1 ✔
$ find out/ -type f
out/pipeline/project_3/project_3_counts.csv
out/pipeline/project_1/project_1_counts.csv
out/pipeline/project_2/project_2_counts.csv
Note that you can also set the publishDir
directive in a configuration file, for example:
process {
withName: 'COUNT' {
publishDir = [
path: { "${params.outdir}/pipeline" },
mode: 'copy',
saveAs: { filename ->
"${filename.replaceFirst('_counts.csv', '')}/${filename}"
}
]
}
}