microservicesdatabase-migrationcqrsdistributed-database

How to handle data migrations in distributed microservice databases


so im learning about microservices and common patterns and i cant seem to find how to address this one issue.

Lets say that my customer needs a module managing customers, and a module managing purchase orders. I believe that when dealing with microservices its pretty natural to split these two functionalities into separate services - each having its own data.

  1. CustomerService
  2. PurchaseOrderService

Also, he wants to have a table of purchase orders displaying the data of both customers and purchase orders, ie .: Customer name, Order number.

Now, i dont want to use the API Composition pattern because the user must be able to sort over any column he wants which (afaik) is impossible to do without slaughtering the performance using that pattern.

Instead, i choose CQRS pattern

  1. after every purchase order / customer update a message is sent to the message broker
  2. message broker notifies the third service about that message
  3. the third service updates its projection in its own database

So, our third service .:

  1. PurchaseOrderTableService

It stores all the required data in the single database - now we can query it, sort over any column we like while still maintaining a good performance.

And now, the tricky part .: In the future, client can change his mind and say "Hey, i need the purchase orders table to display additional column - 'Customer country'"

How does one handle that data migration? So far, The PurchaseOrderTableService knows only about two columns - 'Customer name' and 'Order number'. I imagine that this probably a pretty common problem, so what can i do to avoid reinventing the wheel?

I can of course make CustomerService generate 'CustomerUpdatedMessage' for every existing customer which would force PurchaseOrderTableService to update all its projections, but that seems like a workaround.

If that matters, the stack i thought of is java, spring, kafka, postgresql.


Solution

  • Divide the problem in 2:

    1. Keeping live data in sync: your projection service from now on also needs to persist Customer Country, so all new orders will have the country as expected.

    2. Backfill the older orders: this is a one off operation, so how you implement it really depends on your organization, technologies, etc. For example, you or a DBA can use whatever database tools you have to extract the data from the source database and do a bulk update to the target database. In other cases, you might have to solve it programmatically, for example creating a process in the projection microservice that will query the Customer's microservice API to get the data and update the local copy.

    Also note that in most cases, you will already have a process to backfill data, because the need for the projection microservice might arrive months or years after the orders and customers services were created. Other times, the search service is a 3rd party search engine, like Elastic Search instead of a database. In those cases, I would always keep in hand a process to fully reindex the data.