Suppose we have a custom Python package, called shared_package
, in a private repository, hosted on github or bitbucket. Our private repository is configured for read-only access via SSH, as described e.g. here for github and here for bitbucket.
Another one of our projects, aptly named dependent_project
, depends on this shared_package
, and needs to be deployed to AWS Elastic Beanstalk (EB). Our environment uses the latest "Python on Amazon Linux 2" platform, and we use pipenv
as package manager.
For various reasons, it would be most convenient for us to install shared_package
directly from our online git repository, as described here for pipenv and here for pip.
The Pipfile
for our dependent_project
looks like this:
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
shared_package = {git = "ssh://bitbucket.org/our_username/shared_package.git", editable = true, ref = "2021.0"}
[dev-packages]
awsebcli = "*"
[requires]
python_version = "3.8"
This works well on our local development systems, but when deploying dependent_project
to Elastic Beanstalk, the pipenv
installation fails with: Permission denied (publickey)
.
This leads to the question:
How to configure an Elastic Beanstalk environment, using Amazon Linux 2 platform hooks, so that pipenv
can successfully install a package from a private online git repo, via SSH?
Some pieces of the puzzle can be found in the following discussions, but these do not use Amazon Linux 2 platform hooks:
Assume we have defined the following Elastic Beanstalk environment properties, and both the bitbucket public key file and our private key file have been uploaded to the specified S3 bucket:
S3_BUCKET_NAME="my_bucket"
REPO_HOST_NAME="bitbucket.org"
REPO_HOST_PUBLIC_KEY_NAME="bitbucket_public_key"
REPO_PRIVATE_KEY_NAME="my_private_key"
The configuration can then be accomplished using this hook in .platform/hooks/prebuild
:
#!/bin/bash
# git is required to install our python packages directly from bitbucket
yum -y install git
# file paths (platform hooks are executed as root)
SSH_KNOWN_HOSTS_FILE="/root/.ssh/known_hosts"
SSH_CONFIG_FILE="/root/.ssh/config"
PRIVATE_KEY_FILE="/root/.ssh/$REPO_PRIVATE_KEY_NAME"
# remove any existing (stale) keys for our host from the known_hosts file
[ -f $SSH_KNOWN_HOSTS_FILE ] && ssh-keygen -R $REPO_HOST_NAME
# read the (fresh) host key from S3 file and append to known_hosts file
aws s3 cp "s3://$S3_BUCKET_NAME/$REPO_HOST_PUBLIC_KEY_NAME" - >> $SSH_KNOWN_HOSTS_FILE
# copy our private key from S3 to our instance
aws s3 cp "s3://$S3_BUCKET_NAME/$REPO_PRIVATE_KEY_NAME" $PRIVATE_KEY_FILE
# create an ssh config file to point to the private key file
tee $SSH_CONFIG_FILE <<HERE
Host $REPO_HOST_NAME
User git
Hostname $REPO_HOST_NAME
IdentityFile $PRIVATE_KEY_FILE
HERE
# file permissions must be restricted
chmod 600 $SSH_CONFIG_FILE
chmod 600 $PRIVATE_KEY_FILE
Note this file requires execution permission (chmod +x <file path>
).
UPDATE:
Instead of storing the key in an S3 bucket, it may be more convenient to use either AWS Systems Manager Parameter Store, or AWS Secrets Manager. Do note this may involve additional charges.
Read on for a detailed rationale.
To access a git
repository, our Elastic Beanstalk environment will need to have git
installed.
This can be done in a platform hook using yum
(-y
assumes "yes" to every question):
yum -y install git
To set up an SSH connection between our Elastic Beanstalk (EB) instance and e.g. a bitbucket repository, we need three SSH keys:
The public key for bitbucket.org, to verify that we are connecting to a trusted host.
To obtain the public key for bitbucket.org, in a suitable format for known_hosts
, we can use ssh-keyscan.
To be on the safe side, we should verify this key using a "trusted" source.
In our case the best we can do is compare the public key fingerprint with the "official" one published on the bitbucket (or github) website.
The fingerprint can be calculated from the public key using ssh-keygen
e.g.
ssh-keyscan -t rsa bitbucket.org | ssh-keygen -lf -
The private key and public key for our repository.
A key pair, consisting of private and public key, can be generated using ssh-keygen
.
The private key must be kept secret, the public key must be copied to the list of "access keys" for the bitbucket repository, as described in the bitbucket docs.
Note that it is most convenient to create a key pair without passphrase, otherwise our script will need to handle the passphrase as well.
The public bitbucket host key and our private repo key need to be available in the EB environment during deployment. The private key is secret, so it should not be stored in the source code, nor should it be otherwise version controlled.
The most convenient option would be to store the key values as EB environment properties (i.e. environment variables), because these are readily available during deployment.
In principle, this can be done, e.g. using base64
encoding to store the multiline private key in a single line environment property.
However, the total size of all EB environment property keys and values combined is limited to a mere 4096 bytes, which basically precludes this option.
An alternative is to store the key files in a secure private bucket on AWS S3.
The documentation describes how to set up an IAM role that grants access to your S3 bucket for the EC2 instance. The documentation does provide a configuration example, but this uses .ebextensions
and does not apply to .platform
hooks.
In short, we can create a basic S3 bucket with default settings ("block public access" enabled, no custom permissions), and upload the SSH key files to that bucket.
Then, using the AWS IAM web console, select the aws-elasticbeanstalk-ec2-role
(or, preferably, create a custom role), and attach the AmazonS3ReadOnlyAccess
policy.
Yet another alternative would be to use the AWS parameter store or secrets manager.
During deployment to Elastic Beanstalk, we can use .platform
hooks to download the key files from the S3 bucket to the EC2 instance using the aws cli.
To test connectivity between EC2 and S3, we could use eb ssh to connect to the EC2 instance, followed by, for example, aws s3 ls s3://<bucket name>
to list bucket contents.
To indicate that bitbucket.org is a trusted host, its public key needs to be added to the known_hosts
file on our instance.
In our platform hook script, we remove any existing public keys for the host, in case they are stale, and replace them by the current key from our file on S3:
SSH_KNOWN_HOSTS_FILE="/root/.ssh/known_hosts"
[ -f $SSH_KNOWN_HOSTS_FILE ] && ssh-keygen -R $REPO_HOST_NAME
aws s3 cp "s3://$S3_BUCKET_NAME/$REPO_HOST_PUBLIC_KEY_NAME" - >> $SSH_KNOWN_HOSTS_FILE
The private key can be downloaded from S3 as follows, and we need to restrict the file permissions:
PRIVATE_KEY_FILE="/root/.ssh/$REPO_PRIVATE_KEY_NAME"
aws s3 cp "s3://$S3_BUCKET_NAME/$REPO_PRIVATE_KEY_NAME" $PRIVATE_KEY_FILE
chmod 600 $PRIVATE_KEY_FILE
An SSH configuration file is also required to point to the private key:
tee $SSH_CONFIG_FILE <<HERE
Host $REPO_HOST_NAME
User git
Hostname $REPO_HOST_NAME
IdentityFile $PRIVATE_KEY_FILE
HERE
chmod 600 $SSH_CONFIG_FILE
Again, file permissions must be restricted.
The final script is shown in the summary at the top.
This script could be stored e.g. as .platform/hooks/prebuild/01_configure_bitbucket_ssh.sh
in the project folder.
Note that Amazon Linux 2 uses .platform/hooks
, for normal deployments, and .platform/confighooks
, for configuration deployments.
Often, identical scripts need to be used in both cases.
To prevent duplication of code, our .platform/confighooks/prebuild/01_configure_bitbucket_ssh.sh
could look like this:
#!/bin/bash
source ".platform/hooks/prebuild/01_configure_bitbucket_ssh.sh"
Note from the docs:
[...] All files run as the
root
user. The current working directory (cwd) for platform hooks is the application's root directory. Forprebuild
andpredeploy
files it's the application staging directory, and forpostdeploy
files it's the current application directory. [...]