google-cloud-platformgoogle-cloud-pubsub

GCP pubsub replay DLQ messages only to failed subscriber


We have a system that was built using GCP pubsub where there is a main "events" topic that receives different messages from many sources and publishes all of these to many different subscribers. The subscribers only action certain messages.

When a messages on the subscription fails it goes to a DLQ pubsub - we have one of these for every subscriber.

The problem is that in GCP it seems that you cannot replay a message to an individual pubsub subscriber - you must send it back to the topic. This causes problems where the same message was sent to multiple subscribers and only 1 failed. So if you replay you get a double send to one of them.

When this system was designed the original developer was assuming that it was more like SNS to multiple SQS in AWS, where you could just move message from the SQS DLQ back to the queue that it failed on. Unfortunately this doesn't seem possible.

I have drawn a diagram below of the problem. When a message fails on subscriber A it eventually goes to DLQ A subscriber. I want this to then follow the green line - but it seems only the orange line is possible: GCP pubsub system

How have you handled this scenario in GCP and any suggestions for fixing system architecture would be welcome.


Solution

  • There are two main ways to deal with sending messages back from the DLQ to a single subscription:

    1. Create an additional topic for the replayed messages and have the subscriber listen for messages on both the original topic and the replay topic. Republish the messages to the replay topic. In this scenario, you probably want an independent replay topic for each subscription.
    2. Add a filter to every subscription that accepts all messages that are not replayed or all messages replayed with it as a topic. So for example, for every message that is replayed, you could add an attribute with key set to replay and value set to the subscription for which it is intended. Each subscription would have a filter to receive messages where replay is not present as a key or where the value for replay is the subscription. Normally published messages would have no replay attribute and those that you do replay would have the attribute set to the target subscription. Subscribers for that target subscription would receive the replayed messages, but they'd be filtered out for the others.

    The tradeoff between the two is complexity and cost. In the first case, you have to set up this second topic for each subscription and have the code for getting messages from it. In the second case, you don't have to set up that infrastructure, but since you pay for filtered-out message delivery, you have to consider the costs, especially if you have to replay a lot of messages or have a lot of subscriptions.