amazon-web-servicesdockercontainersaws-batchmicromamba

Docker container using micromamba base image is not working on AWS Batch (Issue with /usr/local/bin/_entrypoint.sh)


Here's my Dockerfile:

# v2024.3.4
# =============================
FROM --platform=linux/amd64 mambaorg/micromamba:1.5.6

SHELL ["/usr/local/bin/_dockerfile_shell.sh"]

WORKDIR /tmp/

# Data
USER root
RUN mkdir -p /volumes/
RUN mkdir -p /volumes/input
RUN mkdir -p /volumes/output
RUN mkdir -p /volumes/database

ENV LC_ALL C.UTF-8
ENV LANG C.UTF-8
ENV MPLBACKEND agg
ENV XDG_CONFIG_HOME /home/qiime2


# Retrieve repository
USER $MAMBA_USER
RUN micromamba install -y -n base -c conda-forge wget ca-certificates

ARG MAMBA_DOCKERFILE_ACTIVATE=1
RUN wget --no-check-certificate https://data.qiime2.org/distro/amplicon/qiime2-amplicon-2024.2-py38-linux-conda.yml

# Install dependencies
RUN micromamba install -y -n base \
    -c https://packages.qiime2.org/qiime2/2024.2/amplicon/released \
    -c bioconda \
    -c conda-forge \
    -c defaults \
    -f /tmp/qiime2-amplicon-2024.2-py38-linux-conda.yml && \ 
    micromamba clean -a -y -f

RUN rm -rf /tmp/qiime2-amplicon-2024.2-py38-linux-conda.yml
# RUN qiime dev refresh-cache


ENTRYPOINT ["/usr/local/bin/_entrypoint.sh"]

Here's my job definition on AWS:

{
  "jobDefinitionName": "qiime2-classify-vsearch__16S-rRNA_JB021824-gtdb_ssu_all_r207",
  "type": "container",
  "containerProperties": {
    "image": "jolespin/qiime2-amplicon:2024.2",
    "command": [
      "mkdir -p",
      "/volumes/output/taxonomic_classification/16S-rRNA_JB021824/vsearch/gtdb_ssu_all_r207/",
      "&&",
      "qiime",
      "feature-classifier",
      "classify-consensus-vsearch",
      "--i-query",
      "/volumes/input/16S-rRNA_JB021824/seqs.qza",
      "--i-reference-reads",
      "/volumes/database/gtdb_ssu_all_r207/seqs.qza",
      "--i-reference-taxonomy",
      "/volumes/database/gtdb_ssu_all_r207/tax.qza",
      "--p-threads",
      "16",
      "--o-classification",
      "/volumes/output/taxonomic_classification/16S-rRNA_JB021824/vsearch/gtdb_ssu_all_r207/classification.qza",
      "--o-search-results",
      "/volumes/output/taxonomic_classification/16S-rRNA_JB021824/vsearch/gtdb_ssu_all_r207/search-results.qza",
      "--verbose"
    ],
    "jobRoleArn": "arn:aws:iam::[redacted_identifier]:role/ecsTaskExecutionRole",
    "executionRoleArn": "arn:aws:iam::[redacted_identifier]:role/ecsTaskExecutionRole",
    "volumes": [
      {
        "name": "efs-volume-database",
        "efsVolumeConfiguration": {
          "fileSystemId": "fs-[redacted_identifier]",
          "transitEncryption": "ENABLED",
          "rootDirectory": "databases/qiime2/"
        }
      },
      {
        "name": "efs-volume-input",
        "efsVolumeConfiguration": {
          "fileSystemId": "fs-[redacted_identifier]",
          "transitEncryption": "ENABLED",
          "rootDirectory": "projects/Amplicon/Data/"
        }
      },
      {
        "name": "efs-volume-output",
        "efsVolumeConfiguration": {
          "fileSystemId": "fs-[redacted_identifier]",
          "transitEncryption": "ENABLED",
          "rootDirectory": "projects/Amplicon/Analysis/"
        }
      }
    ],
    "mountPoints": [
      {
        "sourceVolume": "efs-volume-database",
        "containerPath": "/volumes/database",
        "readOnly": true
      },
      {
        "sourceVolume": "efs-volume-input",
        "containerPath": "/volumes/input",
        "readOnly": true
      },
      {
        "sourceVolume": "efs-volume-output",
        "containerPath": "/volumes/output",
        "readOnly": false
      }
    ],
    "environment": [],
    "ulimits": [],
    "resourceRequirements": [
      {
        "value": "16.0",
        "type": "VCPU"
      },
      {
        "value": "65536",
        "type": "MEMORY"
      }
    ],
    "networkConfiguration": {
      "assignPublicIp": "ENABLED"
    },
    "fargatePlatformConfiguration": {
      "platformVersion": "LATEST"
    },
    "ephemeralStorage": {
      "sizeInGiB": 40
    }
  },
  "tags": {
    "Name": "qiime2-classify-vsearch__16S-rRNA_JB021824-gtdb_ssu_all_r207"
  },
  "platformCapabilities": [
    "FARGATE"
  ]
}

When I ran the job, I got the following error:

/usr/local/bin/_entrypoint.sh: line 24: exec: mkdir -p: not found

When I run the container locally, it finds mkdir just fine:

docker run --name test --rm -it jolespin/qiime2-amplicon:2024.2 bash

(base) mambauser@a51e78ef660d:/tmp$ which mkdir
/usr/bin/mkdir

How can I get my Docker container to work as expected with AWS Batch?


Solution

  • There are two bugs caused by one issue, basically. The "command" presented to the job definition should be an array that can be passed to Popen directly, meaning that it should be an executable that can be run, followed by a series of command options.

    That leads the the first bug, "mkdir -p" should be split up into "mkdir", "-p". That will allow the first command to run, since "mkdir" is an executable, but further on you have "&&", which is shell syntax to run multiple commands. Since you're really trying to run a shell command, you can be explicit about it, and run a command like:

        "command": [
          "/bin/bash", "-c",
          "mkdir -p /path/to/make && qiime --options \"Example String\"",
        ],
    

    This will launch bash to parse the entire string, and run each command in turn. Note that you're now passing all of the options as one string, so you'll need to quote any options that need a space, like I've done with "Example String" here.