I am trying to build a Nextflow pipeline where I need to read a CSV file and use its content to run a Python script. Right now it is just the beginning of the entire pipeline.
My CSV file (objects.csv) looks like this:
Filename,p
file1,12345
file2,51512
file3,67223
...
It contains:
p.The CSV file itself, as well as the filenames in the first column are stored in another folder. At the moment I would like to access them via relative paths.
What I want to achieve is: for each row in the CSV, call a Python script like this:
python3 myscript.py --input_file file1.txt --p 123
My current attempt in Nextflow looks like this:
params.csv = "../../data/objects.csv" // CSV location
params.loc_abc_files = "../../data/objects/" // folder with all available .abc files
process runSimulation {
input:
tuple path(abc_file), val(p)
script:
"""
python3 myscript.py --input $abc_file --p $p
"""
}
workflow {
Channel
.fromPath(params.csv)
.splitCsv(sep: ",", header: true)
.map { row -> tuple(params.loc_abc_files + row.Filename + ".abc", row.Parameter) }
.set { input_files }
runSimulation(input_files)
}
Right now Nextflow complains that I have not passed a 'valid Path value'. That might be true since I just build a string inside the .map()-operator. But how do I properly set up this problem, or how do I pass it to the process as a Path value, respectively?
I also want to make sure that the process runSimulation() can be executed in parallel on all the files within the CSV if the resources are available.
As you said, you're creating a string path in the .map() but it expects a Path.
You can use file() to convert string to Path.
For example:
params.csv = "../../data/objects.csv"
params.loc_abc_files = "../../data/objects/"
process runSimulation {
input:
tuple path(abc_file), val(p)
script:
"""
python3 myscript.py --input_file ${abc_file} --p ${p}
"""
}
workflow {
Channel
.fromPath(params.csv)
.splitCsv(sep: ",", header: true)
.map { row -> tuple(file(params.loc_abc_files + row.Filename + ".abc"), row.p) }
.set { input_files }
runSimulation(input_files)
}