amazon-web-servicesmigrationcapacitydisaster-recovery

Is their enough capacity in all AWS regions for disaster recovery


In case of a disaster, when an entire AWS region fails and all its customers want to move their workloads to the next closest region in a disaster recovery scenario, is AWS ready for this? I imagine millions of servers running in each region. Is AWS ready to provision them in another region the next day? Do they have that capacity at the ready?


Solution

  • AWS global infrastructure is using the concept of Availability Zones inside each region, to partition the resources, isolate the risks and ultimately reduce the blast radius of an eventual failure. AZs are groups of datacenter within a region that are designed to be independent of each others in terms of risks (i.e. different connection to the power grid, redundant and isolated network infrastructure, isolated in terms of geographical risks such as earthquake, fooding etc) Some services are designed to automatically take advantage of this redundant infrastructure (Amazon S3, Amazon DynamoDB, ELB etc), customer do not need to configure anything, redundancy and failover at the regional level is handled by the service. Some other services are operating at AZ level (Amazon EC2, EBS, RDS etc) Fo these services, the best practice is to design for multiple AZ architecture and data replication.
    In the very unlikely case a service would not be available in an AZ, a well architected architecture will transparently fail over to another AZ, without any noticeable customer impact.

    Back to your question, the architecture is designed to avoid a region-wide failure of all services. This never happened since we launched AWS in 2006. And, yes, we have a lot of capacity. I propose you to watch this keynote from James Hamilton to learn more about it https://www.youtube.com/watch?v=AyOAjFNPAbA