I'm running a nextflow pipeline on AWS which executes a series of processes, each with their own docker. The pipeline has worked well locally, but when I run it on AWS I am getting a strange command not found
error.
Disclaimer: the docker containers are properly pushed to their correct repositories in the AWS registry, and are found by the AWS workers when running the pipeline.
This is the full command:
#!/bin/bash -ue
export SENTIEON_LICENSE=xxx.x.xxx.xx && # masking it for obvious reasons
bwa mem -M -R "@RG\tID:23456789929800\tSM:23456789929800\tPL:ILLUMINA" -t 32 /mnt/buckets/reference/hg19/ucsc.hg19.mod.mit.fasta test_R1.fastq.gz test_R2.fastq.gz > 23456789929800.sam &&
sentieon util sort -o 23456789929800_sorted.bam -t 32 --sam2bam -i 23456789929800.sam &&
samtools index 23456789929800_sorted.bam
I'm getting this error:
.command.sh: line 5: samtools: command not found
When I run the docker container where the process is executed:
docker run --rm -it --privileged <name:version>
And I check the existence of the samtools
executable, it actually is there:
root@a387481f5957:/task# which samtools
/usr/local/bin/samtools
I tried specifying the full path. I tried declaring a variable. I tried everything but it just won't find it.
I tried running a micro instance, pulling the docker image, and running the container interactively and I DO FIND samtools
in the $PATH.
What could be happening? Can anyone help me?
I'm posting the answer to my own question for future readers:
In the AWS documentation it was specified in some paragraph that certain tools had to be installed via conda to be recognizable in the path. Turns out samtools was one of these.
It now works like charm after conda install.