amazon-web-servicesaws-cdkamazon-vpc

Decrease the number of AZs for a VPC using AWS CDK


I have a VPC created using AWS CDK with the following config:

        const vpc = new cdk.aws_ec2.Vpc(this, `vpc-${ENV}`;, {
            maxAzs: 3,
            natGateways: 1,
            subnetConfiguration: [
                {
                    subnetType: cdk.aws_ec2.SubnetType.PUBLIC
                },
                {
                    subnetType: cdk.aws_ec2.SubnetType.PRIVATE_WITH_EGRESS
                },
                {
                    subnetType: cdk.aws_ec2.SubnetType.PRIVATE_ISOLATED
                }
            ]
        });

This is working fine for dev, prod, and staging infra (already deployed). However, we wish to update the development environment using the same stack but limiting the number of maxAzs to 1 as we do not require high availability (this should save on cost).

Doing this with:

... rest
    maxAzs: ENV === 'dev' ? 1 : 3,
...rest

When redeploying to update the dev env, I get the following error:

''' xxx-stack failed: Error: The stack named xxx failed to deploy: UPDATE_ROLLBACK_COMPLETE: Resource handler returned message: "The CIDR '...' conflicts with another subnet (Service: Ec2, Status Code: 400, Request ID: xxx)" '''

I presume this is due to the dev env already existing with 3 AZs and the resulting 9 subnets. Is it possible to decrease the amount of AZ's without destroying the stack completely and rebuilding?


Solution

  • This is not straightforward, because by default entire CIDR range will be equally divided between all subnets.

    Imagine, for example, the following simplified scenario: You have a VPC with CIDR range between 10.0.0.0 and 10.0.3.0, with only 3 subnets in 3 AZs. You will end up having (roughly) following subnet IP ranges:

    Note: I said roughly above, because in reality some IPs will be reserved, so distribution might not be exactly like above, but should be close.

    Now, imagine that you change your CDK code to have only 1 subnet. This will result in generated CloudFormation code where a single subnet is assigned entire CIDR range. But keep in mind that when CloudFormation deployment happens it always follows pattern create/updat resources -> cleanup old ones, so it will try to update your first subnet BEFORE the other two are deleted. This will cause the CIDR range conflict you saw.

    How can you fix this?

    Approach 1 (via code) One approach would be to make this change in two steps:

    1. Delete the other two subnets, without changing first subnet's CIDR range. (Unfortunately, last I checked subnet's CIDR range was not visible in CDK L2 construct, so you might need to jump into L1 construct)
    2. Once step 1 is deployed, remove the first subnet's CIDR range overwrite you added in step 1.

    I haven't tried above steps myself, but in theory, after step 1, your two other subnets should be deleted, while subnet 1's CIDR range is not changed. This will then allow you to change subnet 1's CIDR range in second step. In your case you have several subnets in the first AZ, so you will need to overwrite CIDR ranges for all three of them

    Approach 2 (manual) You can also go into the console and manually delete the other subnets first, then try to deploy this change again. This is much simpler that the previous approach, but only works if you have permissions to perform destructive actions in your AWS console