amazon-web-servicesamazon-ecsaws-cloudmapaws-service-connect

AWS ECS Service Connect versus Service Discovery


AWS Cloud Map allows you to set up some namespace for your VPC, and then assign names within that namespace to individual services. The names can either be A) privately discoverable only by API calls, B) discoverable via API calls or via DNS privately within the VPC, or C) discoverable via public DNS and by API calls. ECS can interact with Cloud Map to automatically register services. All this is referred to in AWS ECS as Service Discovery.

AWS ECS also has a relatively new thing called Service Connect. It leverages Cloud Map but also adds a sidecar "proxy" container to your ECS service, effectively creating an automatic service mesh.

I got Service Connect working with ECS using CloudFormation. In my CloudFormation AWS::ECS::Cluster I configured ServiceConnectDefaults to the Cloud Map namespace I want to use, such as example.internal. Then I set enabled: true for the AWS::ECS::Service definitions under ServiceConnectConfiguration, along with a few extra details such as providing a name for the service/port. Assuming I've named my service/port my-service, I believe now that some other service using Service Connect in the same VPC could connect to my-service.example.internal and the sidecar-proxy would figure out some instance of my-service to connect to, without even using DNS! (I haven't tested that yet; I first wanted to get some clarification with the current question.)

But I would like private DNS access as well, if nothing else than to be able to go to Cloud9 and issue e.g. a curl my-service.example.internal/api/test without needing to look up the IP address of one of the my-service instances. I found out that I can define a AWS::ServiceDiscovery::PrivateDnsNamespace and a AWS::ServiceDiscovery::Service (using the same name my-service) and even associate the latter with my ECS service using ServiceRegistries. But then when I try to deploy my CloudFormation stack, I get an error:

Invalid request provided: CreateService error: Service already exists.

I'm guessing that internally to get Service Connect to work, ECS created its own AWS::ServiceDiscovery::Service, at which point it saw that my CloudFormation stack had already created a AWS::ServiceDiscovery::Service with the same name. But if I don't create AWS::ServiceDiscovery::Service myself, the one that ECS creates won't provide a DNS entry for my-service.

Am I to infer that AWS ECS can work with Service Connect (in which case there will be no service DNS entries, but the sidecar proxies will use API calls to look up registered services), or Service Discovery (in which I manually create Cloud Map DNS entries and ECS will automatically register them based upon the AWS::ServiceDiscovery::Service I associate with the ECS service), but not both at the same time? Or did I configure something incorrectly?

I guess if I'm using Service Discovery and get DNS entries, I can simply indicate the (private in my case) DNS entries in the other services and they will find them via Cloud Map, providing me the same capabilities as Service Connect without the need for a sidecar proxy. But maybe Service Connect has some extra monitoring capabilities I'll be losing?

Can someone confirm is this a correct understanding, and elaborate on the practical differences and implications between using Service Connect or Service Discovery with ECS?


Solution

  • The benefit that Service Connect brings over service discovery using plain Cloud Map is faster failover when service instances go down. Using DNS-based lookup with Cloud Map means that when a service goes down, it may take a while (based on TTL settings) for your client to realize that it should get a new IP address. Even worse, your client library may keep the same IP address cached even longer, and/or your client's retry logic may keep trying trying the same IP address upon failure.

    Service Connect on the other hand introduces a sidecar "proxy" container that that intercepts outgoing connections and routes them to the correct destinations. The sidecar uses API calls to Cloud Map to look up an IP address of a healthy instance of the service in real time, rather than relying on DNS entries, which may be stale. This brings the standard benefits of a service mesh such as Envoy, except that in this case Service Connect manages the sidecar for you. See Migrate existing Amazon ECS services from service discovery to Amazon ECS Service Connect for more discussion on these benefits.

    Because Service Connect doesn't rely on DNS, it doesn't bother registering even private DNS entries, and instead registers endpoints with Cloud Map that are privately discoverable only by API calls. There seems to be no way to tell Service Connect to register the service names in the DNS as well.

    You cannot use both Service Connect and Service Discovery at once for the same service name, because as mentioned in the question, both Service Connect and Service Discovery will try to register the same service name with Cloud Map. But you can use them both together using two different service names! If you define your task definition port mappings and service ServiceConnectConfiguration using a service/port discovery name such as my-service-connect, having specified the ECS cluster ServiceConnectDefaults with a namespace of example.internal, your other ECS services can connect to my-service-connect.example.internal even though there are no DNS entries for that name, as described above.

    But you can additionally define a AWS::ServiceDiscovery::PrivateDnsNamespace of example.internal (which Service Connect will use instead of creating a new one) along with a AWS::ServiceDiscovery::Service using a different service name such as my-service. Associate this service discovery with the ECS service using ServiceRegistries, and you will have the best of both worlds! ECS services can communicate using my-service-connect.example.internal, but you can still go to Cloud9 and connect to my-service.example.internal (note the different name) via the DNS, as ECS will ensure that both are registered. It's not guaranteed that both approaches will refer to the same service instance at any particular time, and if a service instance goes down the DNS approach my-service.example.internal may be stale until the new DNS value is propagated, but for ad-hoc tests in Cloud9 (the motivation for this in the first place) that hardly matters.

    I've published a blog post, Using AWS ECS Service Connect and Service Discovery Together, which goes into more depth on how to get this working, complete with CloudFormation template snippets.