We are currently performing event processing in FIFO order in our Rails application using a queue in Sidekiq. Due to some reasons - cost and some other benefits - we are moving from Sidekiq to Shoryuken(with SQS FIFO queue).
Currently, we are using the Sidekiq worker at concurrency: 1
so that it processes one job at a time and maintains FIFO order.
SQS FIFO Queue will also handle this for us after we migrate to Shoryuken.
My problem is this - How do I migrate from Sidekiq to Shoryuken without any downtime but at the same time also maintain the FIFO order of processing of my events. There are 5 parallel Rails Puma instances that publish the events to Sidekiq or will publish to SQS. Restarting all 5 of them at the same time won't ensure that no event will be published to Sidekiq after at least one event is published to SQS. To explain the issue I have described an example as follows:
Example:
How do I avoid this?
Any sort of help or suggestion is appreciated. Thanks in advance!
I ended up doing my migration from Sidekiq to Shoryuken with the help of 2 subsequent deployments, and 2 flags/configs. The solution suggested by Kenneth is also majorly similar to mine, I had just added a few more steps to ensure FIFO order of processing of my events. The whole migration from Sidekiq-Redis to Shoryuken-SQS(FIFO) for processing events in FIFO order can be performed by following the below mentioned steps in order:
Deployment 1 - Deploy a Publish Enabled Config
to start/stop pushing the events to any queue (Sidekiq or SQS). Note that we must have a backup of the events being created for this to work. In our case, we were already saving the created events in an events
table in our DB before pushing.
Turn OFF the Publish Enabled Config
. Let's say we turned it OFF after pushing the event with id=100.
Keep saving in Redis, the event_ids of all the un-pushed events, in a flag called unprocessed_event_ids
.
Wait till all the events that have already been queued in Sidekiq are processed. (i.e., Till event_id=100).
Deployment 2 - Deploy the SQS code - The code changes that push the events to SQS instead of Sidekiq. And also a second config - Processing Via Shoryuken Enabled Config
(turned OFF). This will tell the Shoryuken worker if it's allowed to successfully process events from SQS. When it's off, Shoryuken should raise an exception and stay stuck in a retry loop of the first event in SQS. Note that the usage of Publish Enabled Config
while publishing events should also still be there in the code.
Make sure that the Processing Via Shoryuken Enabled Config
is OFF and then turn ON the Publish Enabled Config
- Starts pushing the events to SQS. Let's say we turned it on after creating the event with id=150.
Process the intermediate events that were not pushed to any queue (neither Sidekiq nor SQS) - event_ids saved in Redis's flag - unprocessed_event_ids
- Run a script to process these events in order. In our example these events will be from event_id=101 to event_id=150.
Turn ON the Processing Via Shoryuken Enabled Config
- Starts processing the events from SQS.
Migration is complete. Clean up unnecessary configs/flags through another deployment, anytime in the future, if and when required.
This is how I handled the migration of FIFO Event Processing from Sidekiq-Redis to Shoryuken-SQS(FIFO), without any down time. Any suggestion or feedback is appreciated.
Thanks!