postgresqlrelational-databasemicroservices

Relational DB in microservices


I have a monolithic application that currently uses a PostgreSQL DB and the schemas are set up as you would expect for most relational databases with various table data being linked back to the user via FKs on the user_id.

I'm trying to learn more about microservices am trying to migrate my python API to a microservice architecture. I have a reasonable understanding of how I'm going to break up the larger app into smaller parts, however, I'm not entirely clear on how I'm supposed to deal with the data side of things.

I understand that one single large DB is against general design principles of microservices but I'm not clear on what the alternative would be.

My biggest concern is cascading across individual databases that would hold microservice data. In a simple rdb, I can just cascade on delete and the DB will handle the work across the various tables. In the case of microservices, how would that work? Would I need to have a separate service that handles deleting user data across the other service DBs?

I don't really understand how I would migrate a traditional application with a relational DB to a microservice architecture?

EDIT:

To clarify - a specific architectural/design problem I'm facing is as follows:

I have split up my application into a few microservices. The ones that are in my mind still relational are:

Geolocation - A service that checks geometry data, records in PostGIS, and returns certain information. A primary purpose is to record the location of a particular user for referencing later

Image - A simple upload service to upload images and store meta data in the db.

Load-Image - A simple service that returns a random set of images based on parameters such as location, and user profile data such as Age, Gender, etc

Profile - A service that simply manages user data such as Age, Gender, etc

Normally, these three items would have a table each in a larger db rather than their own individual dbs. Filtering images by say location and age is a very simple JOIN and filter.

How would something like that work in a microservice architecture? If the data is held in different dbs entirely how would I setup the logic to filter the data? I could duplicate data that doesn't change often like profile info and add it to a MongoDB document that would contain image data including user_id and profile data - however, location data can change regularly and constant updates doesn't sound practical.

What would be the best approach? Or should I stick with a shared RDBMS for just those few services?


Solution

  • It comes down to the duplication of data, why we want it, and how we manage it.

    Early in our careers we were taught about the duplication of data to make it redundant, for example in database replication or backups. We were also taught that data can be modelled in a relational manner, with constraints enforcing the integrity of the model. In fact, the integrity of the model is sacrosanct. Without integrity, how can you have consistency? The answer is that you can't. Kinda.

    When you work with distributed systems and service orientation, you do so because you want to minimise interactions thereby reducing coupling between components. However, there is a cost to this. The more distributed your architecture, the less coupling it has, and the more duplication of data will be necessary. This is taken to an extreme with microservices, where effectively the same data may be present in many different places, in varying degrees of consistency.

    Instead of being bad, however, in this context data duplication is an essential feature of your system. It is an enabler of an architectural style with many great benefits. Put another way, without duplication of data, you get less distribution, you get more coupling, which makes your system more expensive to build, own, and change.

    So, now we understand duplication of data and why we want it, let's move onto how we manage having lots of duplication. Let's try an example:

    In a relational database, let's say we have a table called Customers, which contains a customer ID, and customer details, and another table called Orders which contains the order ID, customer ID, and the order details. Let's say we also have an ordering application, which needs to delete all the customer's orders if the customer is deleted for GDPR.

    Because we are migrating our system to microservices, we decide to create a service called Customers.

    So we create a service with the following operation:

    We create another service called Orders with the following operations:

    We build a UX screen for deleting a customer. The UX first calls the orders service to get all the orders for the customer. Then it iterates over the list of orders, calling the orders service to delete the order. Then it calls the customers service to delete the user.

    This example is very simplistic, but as you can see, there is no option but to orchestrate the "Delete Customer" operation from the caller, which in this case is the user interface. Of course, what would be a single atomic transaction in a database does not translate to multiple HTTP/s calls, so it is possible that some of the calls may not succeed, leaving the system as a whole in an inconsistent state. In this instance the inconsistency would need to be resolved via some recovery mechanism.