modulenextflow

Nextflow: Python script in custom module not found during process execution


I'm working on a Nextflow pipeline that uses a custom module. This module includes a Python script (script_1.py) located in a nested folder <module-dir>/resources/usr/bin. The script_1.py has been made executable and the nextflow.enable.moduleBinaries has been set to true in the ./nextflow.config file. However, when I try to run the pipeline, I get an error that the Python script cannot be found.

Module directory structure

modules/
└── local/
    └── mymodule/
        ├── environment.yml
        ├── main.nf
        ├── resources/
        │   └── usr/
        │       └── bin/
        │           └── script_1.py
        └── work/

Error message

Here's the error I get when running the pipeline:

Caused by:
  Process `MyProcess (1)` terminated with an error exit status (2)

Command executed:

  python script_1.py

  cat <<-END_VERSIONS > versions.yml
      "MyProcess":
          python: $(python --version 2>&1 | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  2

Command output:
  (empty)

Command error:
  python: can't open file 'script_1.py': [Errno 2] No such file or directory

What I tried

In my main.nf, I had the following:

#!/usr/bin/env nextflow

include { MyProcess } from './modules/local/mymodule/main.nf'

And in my ./modules/local/mymodule/main.nf, I had the following:

#!/usr/bin/env nextflow

process MyProcess{
    conda "${moduleDir}/environment.yml"

    input:
    path(input_folder)
    
    output:
    path("data.csv")
    path "versions.yml"                , emit: versions

    script:
    """
    python script_1.py ${input_folder}

cat <<-END_VERSIONS > versions.yml
    "${task.process}":
        python: \$(python --version 2>&1 | sed 's/Python //g')
    END_VERSIONS
    """ 
    
}

But script_1.py is never found, and the process fails.

My question

Is this the correct way to reference such scripts in module in Nextflow pipelines?


Solution

  • I suspect this is because you are not treating the python script like a binary as the language suggests in the the docs.

    You use python script_1.py, which tells the process to use python to look for a script locally rather than invoking the script as a binary. Instead, you should treat the script as a binary using script_1.py, but ensuring the shebang in the script is pointing to the correct interpreter.

    I usually just read scripts in as value channels since it's easier, and you don't need to use wave-containers on GCP/AWS, so this answer is just from my interpretation of the docs. Hope it works.