I'm working on a Nextflow pipeline that uses a custom module. This module includes a Python script (script_1.py
) located in a nested folder <module-dir>/resources/usr/bin
. The script_1.py
has been made executable and the nextflow.enable.moduleBinaries
has been set to true
in the ./nextflow.config
file. However, when I try to run the pipeline, I get an error that the Python script cannot be found.
Module directory structure
modules/
└── local/
└── mymodule/
├── environment.yml
├── main.nf
├── resources/
│ └── usr/
│ └── bin/
│ └── script_1.py
└── work/
Error message
Here's the error I get when running the pipeline:
Caused by:
Process `MyProcess (1)` terminated with an error exit status (2)
Command executed:
python script_1.py
cat <<-END_VERSIONS > versions.yml
"MyProcess":
python: $(python --version 2>&1 | sed 's/Python //g')
END_VERSIONS
Command exit status:
2
Command output:
(empty)
Command error:
python: can't open file 'script_1.py': [Errno 2] No such file or directory
What I tried
In my main.nf
, I had the following:
#!/usr/bin/env nextflow
include { MyProcess } from './modules/local/mymodule/main.nf'
And in my ./modules/local/mymodule/main.nf
, I had the following:
#!/usr/bin/env nextflow
process MyProcess{
conda "${moduleDir}/environment.yml"
input:
path(input_folder)
output:
path("data.csv")
path "versions.yml" , emit: versions
script:
"""
python script_1.py ${input_folder}
cat <<-END_VERSIONS > versions.yml
"${task.process}":
python: \$(python --version 2>&1 | sed 's/Python //g')
END_VERSIONS
"""
}
But script_1.py
is never found, and the process fails.
My question
Is this the correct way to reference such scripts in module in Nextflow pipelines?
I suspect this is because you are not treating the python script like a binary as the language suggests in the the docs.
You use python script_1.py
, which tells the process to use python to look for a script locally rather than invoking the script as a binary. Instead, you should treat the script as a binary using script_1.py
, but ensuring the shebang in the script is pointing to the correct interpreter.
I usually just read scripts in as value channels since it's easier, and you don't need to use wave-containers on GCP/AWS, so this answer is just from my interpretation of the docs. Hope it works.