amazon-ec2terraformaws-code-deployaws-auto-scalingblue-green-deployment

How to use a blue/green deployment style with CodeDeploy and Auto Scaling Groups that allows for repetitive deployments in Terraform?


I have a load-balanced EC2 instance that hosts a web API that I am trying to use AWS CodeDeploy to deploy new versions of the API with. I am trying to make use of a blue/green deployment style, where the "blue" instance is the original EC2 that is terminated once the "green" instance has the application installed onto it and traffic is directed to it.

Our AWS infrastructure is setup/deployed using terraform - this includes the auto scaling group for the EC2, the CodeDeploy application and configuration, load balancer, etc. My understanding is for the CodeDeploy aws_codedeploy_deployment_group Terraform resource, I should stay away from using COPY_AUTO_SCALING_GROUP within the green_fleet_provisioning_option. There is a note on the linked terraform page that explains why:

When using green_fleet_provisioning_option with the COPY_AUTO_SCALING_GROUP action, CodeDeploy will create a new ASG with a different name. This ASG is not managed by terraform and will conflict with existing configuration and state. You may want to use a different approach to managing deployments that involve multiple ASG, such as DISCOVER_EXISTING with separate blue and green ASG.

If I use COPY_AUTO_SCALING_GROUP to copy the existing auto scaling group during a deployment, CodeDeploy will copy the existing auto scaling group, give it a new name, and use it for the green fleet. The issue with this is that the terraform state becomes unaware of this auto scaling group, meaning a re-deployment of our infrastructure would cause duplicate EC2s to be created (and running). This causes issues with future deployments.

Like the note says, I should probably be using DISCOVER_EXISTING (i.e. manually provision instances for a deploy). The CodeDeploy deployment group configuration setting in the console says for this:

I will specify here the instances where the current application revision is running. I will specify the instances for the replacement environment when I create a deployment.

"Here" being the deployment group configuration, which is setup via the terraform in my case. For the "specify the instances for the replacement environment" part, you would provide a --target-instances option when you execute the create-deployment command using the AWS CLI, as described in the AWS Documentation. For this option, you can supply either an auto scaling group name, or EC2 tag sets as a JSON string. I opted for an auto scaling group name for this option, where this auto scaling group is also created via our terraform. Thus I end up with two auto scaling groups where each has a desired capacity of only one, and after deploying the terraform, results in two EC2s hosting my API.

To perform a deployment then, this is the resulting command to start the deployment (called via a GitHub action):

aws deploy create-deployment --application-name <app_name> --s3-location bucket=<bucket_Name>,key=revision_LATEST.zip,bundleType=zip --deployment-group-name <deployment_group_name> --deployment-config-name CodeDeployDefault.OneAtATime --description "My description" --region <region> --target-instances={\"autoScalingGroups\":[\"<my_green_auto_scaling_group_name\"]}

At this point, this deployment runs and works. But afterwards, you can no longer do anymore deployments because the target instance auto scaling group has now essentially become the "blue" group, with no "green" group existing anymore. In fact, CodeDeploy will automatically update the deployment group configuration section where you specify the instances that are running the current version of your application to be that of the now former target instance auto scaling group. If I try to re-run the CLI command, the deployment fails during the "Installing application on replacement instances" step with:

The deployment failed because no instances were found for your deployment group

It feels like I am missing something. I would think the CodeDeploy should be aware that it needs a green instance in order to do a deployment. Am I suppose to create a new auto scaling group after each deployment and somehow keep my GitHub Action aware of this auto scaling group for when I call the CLI command? I am not fond of having two EC2s running the API as well upon initial deployment of the terraform. Ideally, I'd like a way to have only one EC2 at a time running, and a way to continuously allow deployments to occur using the blue/green deployment style. Is this possible?


Solution

  • I ended up having two autoscaling groups, and cycled between them for the deployments. This allowed me to maintain my terraform state between the autoscaling groups and the code deploy infrastructure. The only downside with this implementation is it leaves whatever EC2s tied to the "blue" group active, but traffic is blocked from them automatically by CodeDeploy (should you have that enabled).

    The steps were:

    1. Code into your terraform two autoscaling groups. Name them something that should not change when running your terraform multiple times (i.e. something static, or environment based if you have multiple environments). When your terraform runs, you should have two autoscaling groups with their linked EC2s up and running (note - both autoscaling groups will initially have traffic flowing to them, at least in the way I implemented this).
    2. Make sure your CodeDeploy configuration is set to the DISCOVER_EXISTING option in your terraform. You will also want to specify the autoscaling group that you wish to be your blue group in the terraform for the aws_codedeploy_deployment_group by specifying the autoscaling_groups option.
    3. In your CI/CD code (in my case, GitHub Actions), you will want to maintain local variables for the names of your two autoscaling groups. Since you be switching between which autoscaling group is your --target-instances between deployments, you need to determine which autoscaling group is currently the "green" one. You can do this using the AWS CLI. notably by using get-deployment-group and looking at what the currently assigned autoscaling group is to it (whatever is assigned to it is the blue group, thus the green group is your other autoscaling group).
    4. Create the deployment against your application, setting the --target-instances option of the create-deployment command.

    A bash example below for doing this via the AWS CLI is below:

    asg_one_name="autoscalingGroupOneName"
    asg_two_name="autoscalingGroupTwoName"
    blue_asg=$(aws deploy get-deployment-group --application-name app_name --deployment-group-name deployment_group_name --query deploymentGroupInfo.autoScalingGroups[0].name --output text)
    if [ $asg_one_name == $blue_asg ]; then
      green_auto_scaling_group="$asg_two_name"
      blue_auto_scaling_group="$asg_one_name"
    else
      green_auto_scaling_group="$asg_one_name"
      blue_auto_scaling_group="$asg_two_name"
    fi
    echo The green autoscaling group to use is: $green_auto_scaling_group.
    echo Deploying application...
    deploymentId=$(aws deploy create-deployment --application-name app_name --s3-location bucket=bucketName,key=revision_latest.zip,bundleType=zip --deployment-group-name deployment_group_name --deployment-config-name CodeDeployDefault.OneAtATime --description "My description" --region us-east-2 --target-instances={\"autoScalingGroups\":[\"$green_auto_scaling_group\"]} --output text)
    echo Waiting for Deployment to complete...
    sleep 30
    for i in {1..60}
    do
      deploymentStatus=$(aws deploy get-deployment --deployment-id $deploymentId --query 'deploymentInfo.status' --output text)
      if [ $deploymentStatus == 'Succeeded' ]; then
        echo Deployment succeeded.
        break
      elif [ $deploymentStatus == 'Failed' ]; then
        echo Deployment failed.
        exit 1
      else
        echo Deployment is still in progress at loop interval $i.  Waiting 30 seconds before checking again...
        sleep 30
      fi
      if [ $i == 60 ] && [ $deploymentStatus == 'InProgress' ]; then
        echo Deployment did not complete within the expected time.  Exiting.
        exit 1
      fi
    done
    

    Essentially, this determines which autoscaling group is your green group, and creates a deployment where the --target-instances specifies the green group. It then polls the status of the deployment to make sure it completes.

    Once the deployment completes, CodeDeploy will automatically update the blue autoscaling group that is found in the deployment group configuration to be that of what $green_auto_scaling_group was in the script. Because of this, you should be able to re-run a deployment as much as you want, with the script automatically determining the green group for you each time correctly. And lastly, because both auto scaling groups are created in your terraform, they are maintained in the state for it, so running your terraform shouldn't cause any abnormal behavior anymore.