pythonmysqldatabasereplicationfault-tolerance

How to create a "buffer" program to act as a holding ground for MySql data lines, that will then be passed on when the cloud DB is available


I want to create a Python3 program that takes in MySQL data and holds it temporarily, and can then pass this data onto a cloud MySQL database.

The idea would be that it acts as a buffer for entries in the event that my local network goes down, the buffer would then be able to pass those entries on at a later date, theoretically providing fault-tolerance.

I have done some research into Replication and GTIDs and I'm currently in the process of learning these concepts. However I would like to write my own solution, or at least have it be a smaller program rather than a full implementation of replication server-side.

I already have a program that generates some MySQL data to fill my DB, the key part I need help with would be the buffer aspect/implementation (The code itself I have isn't important as I can rework it later on).

I would greatly appreciate any good resources or help, thank you!


Solution

  • I would implement what you describe using a message queue.

    Example: https://hevodata.com/learn/python-message-queue/

    The idea is to run a message queue service on your local computer. Your Python application pushes items into the MQ instead of committing directly to the database.

    Then you need another background task, called a worker, which you may also write in Python or another language, which consumes items from the MQ and writes them to the cloud database when it's available. If the cloud database is not available, then the background worker pauses.

    The data in the MQ can grow while the background worker is paused. If this goes on too long, you may run out of space. But hopefully the rate of growth is slow enough and the cloud database is available regularly, so the risk of this happening is low.


    Re your comment about performance.

    This is a different application architecture, so there are pros and cons.

    On the one hand, if your application is "writing" to a local MQ instead of the remote database, it's likely to appear to the app as if writes have lower latency.

    On the other hand, posting to the MQ does not write to the database immediately. There still needs to be a step of the worker pulling an item and initiating its own write to the database. So from the application's point of view, there is a brief delay before the data appears in the database, even when the database seems available.

    So the app can't depend on the data being ready to be queried immediately after the app pushes it to the MQ. That is, it might be pretty prompt, under 1 second, but that's not the same as writing to the database directly, which ensures that the data is ready to be queried immediately after the write.

    The performance of the worker writing the item to the database should be identical to that of the app writing that same item to the same database. From the database perspective, nothing has changed.