asp.net-core microservices api-gateway eventual-consistency

Data replication or API Gateway Aggregation: which one to choose using microservices?

As an example, let's say that I'm building a simple social network. I currently have two services:

Identity, managing the users, their personal data (e-mail, password hashes, etc.) and their public profiles (username) and authentication
Social, managing the users' posts, their friends and their feed

The Identity service can give the public profile of an user using its API at /api/users/{id}:

// GET /api/users/1 HTTP/1.1
// Host: my-identity-service

{
  "id": 1,
  "username": "cat_sun_dog"
}

The Social service can give a post with its API at /api/posts/{id}:

// GET /api/posts/5 HTTP/1.1
// Host: my-social-service

{
  "id": 5,
  "content": "Cats are great, dogs are too. But, to be fair, the sun is much better.",
  "authorId": 1
}

That's great, but my client, a web app, would like to show the post with the author's name, and it would preferably receive the following JSON data in one single REST request.

{
  "id": 5,
  "content": "Cats are great, dogs are too. But, to be fair, the sun is much better.",
  "author": {
    "id": 1,
    "username": "cat_sun_dog"
  }
}

I found two main ways to approach this.

Data replication

As described in Microsoft's guide for data and Microsoft's guide for communication between microservices, it's possible for a microservice to replicate the data it needs by setting up an event bus (such as RabbitMQ) and consuming events from other services:

And finally (and this is where most of the issues arise when building microservices), if your initial microservice needs data that's originally owned by other microservices, do not rely on making synchronous requests for that data. Instead, replicate or propagate that data (only the attributes you need) into the initial service's database by using eventual consistency (typically by using integration events, as explained in upcoming sections).

Therefore, the Social service can consume events produced by the Identity service such as UserCreatedEvent and UserUpdatedEvent. Then, the Social service can have in its very own database a copy of all the users, but only the required data (their Id and Username, nothing more).

With this eventual consistent approach, the Social service now has all the required data for the UI, all in one request!

// GET /api/posts/5 HTTP/1.1
// Host: my-social-service

{
  "id": 5,
  "content": "Cats are great, dogs are too. But, to be fair, the sun is much better.",
  "author": {
    "id": 1,
    "username": "cat_sun_dog"
  }
}

Benefits:

Makes the Social service totally independent from the Identity service; it can work totally fine without it
Retrieving the data requires less network roundtrips
Provides data for cross-service validation (e.g. check if the given user exists)

Drawbacks and questions:

Takes some time for a change to propagate
The system is absolutely RUINED for some users if some messages fail to get through due to a disaster that fried all your replicated queues!
What if, one day, I need more data from the user, like their ProfilePicture?
What to do if I want to add a new service with the same replicated data?

API Gateway aggregation

As described in Microsoft's guide for data, it's possible to create an API gateway that aggregates data from two requests: one to the Social service, and another to the Identity service.

Therefore, we can have an API gateway action (/api/posts/{id}) implemented as such, in pseudo-code for ASP.NET Core:

[HttpGet("/api/posts/{id}")]
public async Task<IActionResult> GetPost(int id) 
{
  var post = await _postService.GetPostById(id);
  if (post is null) 
  {
    return NotFound();
  }

  var author = await _userService.GetUserById(post.AuthorId);
  return Ok(new 
  {
    Id = post.Id,
    Content = post.Content,
    Author = new 
    {
      Id = author.Id,
      Username = author.Username
    }
  });
}

Then, a client just uses the API gateway and gets all the data in one query, without any client-side overhead:

// GET /api/posts/5 HTTP/1.1
// Host: my-api-gateway

{
  "id": 5,
  "content": "Cats are great, dogs are too. But, to be fair, the sun is much better.",
  "author": {
    "id": 1,
    "username": "cat_sun_dog"
  }
}

Benefits:

Very easy to implement
Always gives the up-to-date data
Gives a centralized place to cache API queries

Drawbacks and questions:

Increased latency: in this case, it's due to two sequential network roundtrips
The action breaks if the Identity service is down, although this can be mitigated using the circuit breaker pattern, the client won't see the author's name anyway
Unused data might get still queried and waste resources (but that's marginal most of the time)

Having those two options: aggregation on the API gateway and data replication on individual microservices using events, which one to use for which situation, and how to implement them correctly?

Solution

In general, I strongly favor state replication via events in durable log-structured storage over services making synchronous (in the logical sense, even if executed in a non-blocking fashion) queries.

Note that all systems are, at a sufficiently high level, eventually consistent: because we don't stop the world to allow an update to a service to happen, there's always a delay from update to visibility elsewhere (including in a user's mind).

In general, if you lose your datastores, things get ruined. However, logs of immutable events give you active-passive replication for nearly free (you have a consumer of that log which replicates events to another datacenter): in a disaster you can make the passive side active.

If you need more events than you are already publishing, you just add a log. You can seed the log with a backfilled dump of synthesized events from the state before the log existed (e.g. dump out all the current ProfilePictures).

When you think of your event bus as a replicated log (e.g. by implementing it using Kafka), consumption of an event doesn't prevent arbitrarily many other consumers from coming along later (it's just incrementing your read-position in the log). So that allows for other consumers to come along and consume the log for doing their own remix. One of those consumers could be simply replicating the log to another datacenter (enabling that active-passive).

Note that once you allow services to maintain their own views of the important bits of data from other services, you are in practice doing Command Query Responsibility Segregation (CQRS); it's thus a good idea to familiarize yourself with CQRS patterns.