javaspring-bootperformanceevent-sourcingcommand-query-separation

False success-message due to race conditions and eventual consistency caused by cqrs with event sourcing


I'm very new to the concepts of CQRS and Event Sourcing, so don't be too hard on me (my English isn't that good either), so I'm sorry if there are any understanding problems.

I want to program a microservice that handles user authentication. This microservice will be written in Java using Spring Boot. At the moment I have a REST API with an endpoint /register. This endpoint has a dto with the attributes "email", "username" and "password".

The first methods called within this endpoint are isEmailUniqueQueryHandler and isUsernameUniqueQueryHandler with the corresponding query objects. If any of these attributes are not unique, the endpoint will return an error.

So far, so good. The next method called after validation is the (async) handle method of registerUserCommandHandler. This method is designed to be "fire and forget", so if an error occurs within this method, the controller will not know about it.

The only error that can occur within this method is on the line where the user object is stored in database

user.register(eventPublisher);

This line can throw a DataIntegrityViolationException if the given username or email is not unique. This exception will only be thrown if two requests have been sent within 2 seconds (when the application is running locally) because the validation in the controller has returned that the username and email are unique, due to the race conditions and the separation of the write (event store) db and the read db (postgres db). The exception was caught within the function, but not returned to the function within the controller, due to the "fire and forget" principle.

Now the question is, is this false success message critical in a real world scenario?

I hope somebody understand what I'm asking for. Thank you in advance and have a nice day!


Solution

  • My answer will be language and framework agnostic. Treat is from the design perspective.

    Yes, such false success is critical in a real scenario. What you are doing is called set validation and is a very good question to think about in the context of CQRS/ES. There are a couple of ways how you can improve the flow.

    First, let's distribute the responsibilities across the components.

    An endpoint handler is responsible for translating HTTP requests into something domain-specific (a command). And translating a domain-specific response back to HTTP response. It is not the handler's responsibility to validate uniqueness.

    A command handler calls validation methods and makes up a decision, whether to store event or not. When event is stored successfully, the command handler may return all the generated ids.

    If registerUserCommandHandler is fire-and-forget - it means, at the end of handling HTTP request you cannot be sure whether the operation succeeded or not. You can only be sure that the command is submitted.

    As an alternative, the command handler can register user synchronously, but the user will be considered "not approved" yet.

    Next, an asynchronous process will pick up "UserRegistered" event, do a set validation and issue "UserActivated" event. Or "UserSuspended" event in the case of validation failure.

    To avoid race conditions, the validating process could use a lock on a read model to be sure that it is handled by a single instance at a time.

    In the meantime, the frontend may periodically (say, every second) poll the corresponding read model to eventually learn the outcome.

    On the one hand, this approach requires writing additional code and running a validating job. On the other hand, it eliminates the race condition and makes the flow more robust and explicit. You can event discuss it with a domain expert.