nextflow

How to pass filenames and parameters from a CSV to a Nextflow process?


I am trying to build a Nextflow pipeline where I need to read a CSV file and use its content to run a Python script. Right now it is just the beginning of the entire pipeline.

My CSV file (objects.csv) looks like this:

Filename,p
file1,12345
file2,51512
file3,67223
...

It contains:

The CSV file itself, as well as the filenames in the first column are stored in another folder. At the moment I would like to access them via relative paths.

What I want to achieve is: for each row in the CSV, call a Python script like this:

python3 myscript.py --input_file file1.txt --p 123

My current attempt in Nextflow looks like this:

params.csv = "../../data/objects.csv"          // CSV location
params.loc_abc_files = "../../data/objects/"   // folder with all available .abc files

process runSimulation {
    input:
    tuple path(abc_file), val(p)

    script:
    """
    python3 myscript.py --input $abc_file --p $p
    """
}

workflow {
    
    Channel
        .fromPath(params.csv)
        .splitCsv(sep: ",", header: true)
        .map { row -> tuple(params.loc_abc_files + row.Filename + ".abc", row.Parameter) }
        .set { input_files }

    runSimulation(input_files)

}

Right now Nextflow complains that I have not passed a 'valid Path value'. That might be true since I just build a string inside the .map()-operator. But how do I properly set up this problem, or how do I pass it to the process as a Path value, respectively?

I also want to make sure that the process runSimulation() can be executed in parallel on all the files within the CSV if the resources are available.


Solution

  • As you said, you're creating a string path in the .map() but it expects a Path.

    You can use file() to convert string to Path.

    For example:

    params.csv = "../../data/objects.csv"
    params.loc_abc_files = "../../data/objects/"
    
    process runSimulation {
        input:
        tuple path(abc_file), val(p)
    
        script:
        """
        python3 myscript.py --input_file ${abc_file} --p ${p}
        """
    }
    
    workflow {
        Channel
            .fromPath(params.csv)
            .splitCsv(sep: ",", header: true)
            .map { row -> tuple(file(params.loc_abc_files + row.Filename + ".abc"), row.p) }
            .set { input_files }
    
        runSimulation(input_files)
    }