I'm running a pipeline on using nextflow on google batch. However, I'm getting the following error:
ERROR ~ Error executing process > 'PLANT:NLREXPRESS (All_Candidate_Soybean_Prots_Simplified_Sorted)'
Caused by:
Process `PLANT:NLREXPRESS (All_Candidate_Soybean_Prots_Simplified_Sorted)` terminated with an error exit status (null)
Command executed:
mkdir output
nlrexpress.py \
--input All_Candidate_Soybean_Prots_Simplified_Sorted.fasta \
--outdir ./output \
--module all
mv output/*.short.output.txt ./
Command exit status:
null
Command output:
15/06/2023 15:36:31: ############ NLRexpress started ############
15/06/2023 15:36:31: Input FASTA: All_Candidate_Soybean_Prots_Simplified_Sorted.fasta
15/06/2023 15:36:31: Checking FASTA file - started
15/06/2023 15:36:31: Checking FASTA file - done
15/06/2023 15:36:31: Running JackHMMER - started
Command error:
time="2023-06-15T15:39:22Z" level=error msg="error waiting for container: "
Work dir:
gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
-- Check '.nextflow.log' file for details
The module nf file is here:
process NLREXPRESS {
tag "$sample_id"
maxForks 1
container = 'dthorbur1990/nlrexpress:latest'
cpus { 4 * task.attempt }
memory { 12.GB * task.attempt }
disk "15.GB"
publishDir(
path: "${params.PlantDir}",
mode: 'copy',
)
input:
tuple val(sample_id), path(peptides)
output:
path "*.short.output.txt", emit: nlre_out
script:
"""
mkdir output
nlrexpress.py \\
--input ${peptides} \\
--outdir ./output \\
--module ${params.NE_Modules}
mv output/*.short.output.txt ./
"""
}
The process was running without error when I ran it locally, and I have rebuilt the container and it works as intended.
What confuses me is that the workDir
doesn't contain either .command.{out,err}
files suggesting (to me at least) that it's not running. But the Command output section of the error message is the correct first few lines of the tool.
Here is the workDir:
gsutil ls gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1
gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1/.command.begin
gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1/.command.run
gs://rb-rnaseq/workDir/6e/090e663de08b69ce6c9506dc4975c1/.command.sh
And here is the end of the log file regarding the NLREXPRESS module:
All_Candidate_Soybean_Prots_Simplified_Sorted)","q3Label":"PLANT:NLREXPRESS (All_Candidate_Soybean_Prots_Simplified_Sorted)"},"writes":null},{"cpuUsage":null,"process":"ORIENTATION","mem":null,"memUsage":null,"timeUsage":null,"vmem":null,"reads":null,"cpu":null,"time":null,"writes":null}]
I'm at a loss. I've tried increasing memory but that hasn't seemed to have worked. Any ideas? Happy to add the nextflow.log
file if that would be helpful.
I'm not sure if I have an answer for you, but I think this behavior might have something to do with how Nextflow runs the job. If you look at the end of the nxf_main
function in the .command.run
script, you'll see something like:
nxf_main() {
...
set +e
ctmp=$(set +u; nxf_mktemp /dev/shm 2>/dev/null || nxf_mktemp $TMPDIR)
local cout=$ctmp/.command.out; mkfifo $cout
local cerr=$ctmp/.command.err; mkfifo $cerr
tee .command.out < $cout &
tee1=$!
tee .command.err < $cerr >&2 &
tee2=$!
( nxf_launch ) >$cout 2>$cerr &
pid=$!
wait $pid || nxf_main_ret=$?
wait $tee1 $tee2
nxf_unstage
}
When errexit
is enabled (set -e
), any command that returns a non-zero exit status immediately terminates the script. So by using set +e
, we are explicitly disabling this behavior. This means that .command.out
and .command.err
may not necessarily be created despite the Docker container being run (via nxf_launch
).
So I wonder if there is a problem with the size of your /dev/shm
? You could try using the docker.runOptions
configuration scope to bump the shm-size1. For example, with the following to your nextflow.config
:
docker {
enabled = true
runOptions = '--shm-size 2g'
}