gitsshamazon-elastic-beanstalkpipenvamazon-linux-2

Configuring Elastic Beanstalk for SSH access to private git repo using Amazon Linux 2 hooks


Suppose we have a custom Python package, called shared_package, in a private repository, hosted on github or bitbucket. Our private repository is configured for read-only access via SSH, as described e.g. here for github and here for bitbucket.

Another one of our projects, aptly named dependent_project, depends on this shared_package, and needs to be deployed to AWS Elastic Beanstalk (EB). Our environment uses the latest "Python on Amazon Linux 2" platform, and we use pipenv as package manager.

For various reasons, it would be most convenient for us to install shared_package directly from our online git repository, as described here for pipenv and here for pip. The Pipfile for our dependent_project looks like this:

[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
shared_package = {git = "ssh://bitbucket.org/our_username/shared_package.git", editable = true, ref = "2021.0"}

[dev-packages]
awsebcli = "*"

[requires]
python_version = "3.8"

This works well on our local development systems, but when deploying dependent_project to Elastic Beanstalk, the pipenv installation fails with: Permission denied (publickey).

This leads to the question:

How to configure an Elastic Beanstalk environment, using Amazon Linux 2 platform hooks, so that pipenv can successfully install a package from a private online git repo, via SSH?

Some pieces of the puzzle can be found in the following discussions, but these do not use Amazon Linux 2 platform hooks:


Solution

  • Summary

    Assume we have defined the following Elastic Beanstalk environment properties, and both the bitbucket public key file and our private key file have been uploaded to the specified S3 bucket:

    S3_BUCKET_NAME="my_bucket"
    REPO_HOST_NAME="bitbucket.org"
    REPO_HOST_PUBLIC_KEY_NAME="bitbucket_public_key"
    REPO_PRIVATE_KEY_NAME="my_private_key"
    

    The configuration can then be accomplished using this hook in .platform/hooks/prebuild:

    #!/bin/bash
    
    # git is required to install our python packages directly from bitbucket
    yum -y install git
    
    # file paths (platform hooks are executed as root)
    SSH_KNOWN_HOSTS_FILE="/root/.ssh/known_hosts"
    SSH_CONFIG_FILE="/root/.ssh/config"
    PRIVATE_KEY_FILE="/root/.ssh/$REPO_PRIVATE_KEY_NAME"
    
    # remove any existing (stale) keys for our host from the known_hosts file
    [ -f $SSH_KNOWN_HOSTS_FILE ] && ssh-keygen -R $REPO_HOST_NAME
    
    # read the (fresh) host key from S3 file and append to known_hosts file
    aws s3 cp "s3://$S3_BUCKET_NAME/$REPO_HOST_PUBLIC_KEY_NAME" - >> $SSH_KNOWN_HOSTS_FILE
    
    # copy our private key from S3 to our instance
    aws s3 cp "s3://$S3_BUCKET_NAME/$REPO_PRIVATE_KEY_NAME" $PRIVATE_KEY_FILE
    
    # create an ssh config file to point to the private key file
    tee $SSH_CONFIG_FILE <<HERE
    Host $REPO_HOST_NAME
        User git
        Hostname $REPO_HOST_NAME
        IdentityFile $PRIVATE_KEY_FILE
    HERE
    
    # file permissions must be restricted
    chmod 600 $SSH_CONFIG_FILE
    chmod 600 $PRIVATE_KEY_FILE
    

    Note this file requires execution permission (chmod +x <file path>).

    UPDATE:

    Instead of storing the key in an S3 bucket, it may be more convenient to use either AWS Systems Manager Parameter Store, or AWS Secrets Manager. Do note this may involve additional charges.

    Detailed explanation

    Read on for a detailed rationale.

    Git

    To access a git repository, our Elastic Beanstalk environment will need to have git installed. This can be done in a platform hook using yum (-y assumes "yes" to every question):

    yum -y install git
    

    SSH keys

    To set up an SSH connection between our Elastic Beanstalk (EB) instance and e.g. a bitbucket repository, we need three SSH keys:

    Storing the keys on AWS

    The public bitbucket host key and our private repo key need to be available in the EB environment during deployment. The private key is secret, so it should not be stored in the source code, nor should it be otherwise version controlled.

    The most convenient option would be to store the key values as EB environment properties (i.e. environment variables), because these are readily available during deployment. In principle, this can be done, e.g. using base64 encoding to store the multiline private key in a single line environment property. However, the total size of all EB environment property keys and values combined is limited to a mere 4096 bytes, which basically precludes this option.

    An alternative is to store the key files in a secure private bucket on AWS S3. The documentation describes how to set up an IAM role that grants access to your S3 bucket for the EC2 instance. The documentation does provide a configuration example, but this uses .ebextensions and does not apply to .platform hooks.

    In short, we can create a basic S3 bucket with default settings ("block public access" enabled, no custom permissions), and upload the SSH key files to that bucket. Then, using the AWS IAM web console, select the aws-elasticbeanstalk-ec2-role (or, preferably, create a custom role), and attach the AmazonS3ReadOnlyAccess policy.

    Yet another alternative would be to use the AWS parameter store or secrets manager.

    During deployment to Elastic Beanstalk, we can use .platform hooks to download the key files from the S3 bucket to the EC2 instance using the aws cli.

    To test connectivity between EC2 and S3, we could use eb ssh to connect to the EC2 instance, followed by, for example, aws s3 ls s3://<bucket name> to list bucket contents.

    Updating known_hosts

    To indicate that bitbucket.org is a trusted host, its public key needs to be added to the known_hosts file on our instance. In our platform hook script, we remove any existing public keys for the host, in case they are stale, and replace them by the current key from our file on S3:

    SSH_KNOWN_HOSTS_FILE="/root/.ssh/known_hosts"
    [ -f $SSH_KNOWN_HOSTS_FILE ] && ssh-keygen -R $REPO_HOST_NAME
    aws s3 cp "s3://$S3_BUCKET_NAME/$REPO_HOST_PUBLIC_KEY_NAME" - >> $SSH_KNOWN_HOSTS_FILE
    

    Specifying the private key

    The private key can be downloaded from S3 as follows, and we need to restrict the file permissions:

    PRIVATE_KEY_FILE="/root/.ssh/$REPO_PRIVATE_KEY_NAME"
    aws s3 cp "s3://$S3_BUCKET_NAME/$REPO_PRIVATE_KEY_NAME" $PRIVATE_KEY_FILE
    chmod 600 $PRIVATE_KEY_FILE
    

    An SSH configuration file is also required to point to the private key:

    tee $SSH_CONFIG_FILE <<HERE
    Host $REPO_HOST_NAME
        User git
        Hostname $REPO_HOST_NAME
        IdentityFile $PRIVATE_KEY_FILE
    HERE
    chmod 600 $SSH_CONFIG_FILE
    

    Again, file permissions must be restricted.

    The final script is shown in the summary at the top. This script could be stored e.g. as .platform/hooks/prebuild/01_configure_bitbucket_ssh.sh in the project folder.

    Hooks and confighooks

    Note that Amazon Linux 2 uses .platform/hooks, for normal deployments, and .platform/confighooks, for configuration deployments. Often, identical scripts need to be used in both cases. To prevent duplication of code, our .platform/confighooks/prebuild/01_configure_bitbucket_ssh.sh could look like this:

    #!/bin/bash
    source ".platform/hooks/prebuild/01_configure_bitbucket_ssh.sh"
    

    Note from the docs:

    [...] All files run as the root user. The current working directory (cwd) for platform hooks is the application's root directory. For prebuild and predeploy files it's the application staging directory, and for postdeploy files it's the current application directory. [...]