architecturemicroservicesmeshevent-driven-design

How can microservice based applications use Service Mesh and Event Driven Architecture together?


I came across some applications where they say they use Event Driven Architecture to communicate between microservices (Kafka as event broker) and also a Service Mesh (LinkerD)

There are a couple of questions I have not found answers for:

  1. As I read one of the main features of a Service Mesh (in this case linkerD) is to help service-to-service communication (service discovery, retry, circuit breaking, ..). If an application uses kafka as an event broker to communicate between microservices, how does the Service Mesh come into the picture?

Let's say we have ServiceA and ServiceB (both with multiple deployments / nodes). If ServiceA wants to talk to ServiceB, it can produce to a Kafka topic, and ServiceB can subscribe. How can a Service Mesh be present in this communications, and how can it improve the communication?

  1. If we have multiple deployments of ServiceB because of the load, how does load balancing happen here? If each deployment has a "sidecar" proxy, how do they decide how to read from Kafka, and which partitions the particular nodes read? Do they operate as a consumer group?

Solution

  • Service meshes and event-driven approaches (at least those built around a durable log as in Kafka) are complimentary because they fundamentally aid different modes of communication. So the short answer is that the service mesh has nothing to do with Kafka and vice versa (the one exception might be if you decide to discover Kafka brokers via the mesh).

    Service mesh is primarily concerned with temporally coupled communications: where the communicators are both running at the same time. This pattern facilitates conversational communication (e.g. request-response). It's not that effective for an event-driven approach (because you typically will need to know something about events that happened when you weren't running) unless your services themselves happen to be, in effect, reimplementing Kafka (e.g. streaming events through a websocket or grpc or whatever).

    Conversely, Kafka is well suited to communications where you don't care when the receiver gets the message (which extends to not caring how many times a message is received or even, potentially, not even caring that a message is received). Trying to implement request-response in Kafka is a "that way lies madness and/or lighting massive quantities of resources on fire" sort of thing.

    So in your first case, service A and service B are unlikely to use the service mesh to facilitate their use of Kafka (but are still free to use the service mesh for other purposes). In your second case, the usual Kafka partitioning and consumer group approach would be the most likely way to do load balancing (and failure handling, etc.).

    Horses for courses and all that.