kubernetesgoogle-kubernetes-enginepublish-subscribegoogle-cloud-pubsub

Multiple replicas of a kubernetes pods subscribing to a Google cloud pub/sub


I'm setting up a service running on a kubernetes pod and subscribing to a pub/sub topic via a pull subscription. The service would consume messages pushed to the topic.

Now I'm scaling my pods to 10 replicas and observing that sometimes when there's a new message pushed to the topic, several pods would receive the message at the same time. It's not the case that pod 1 fails to ack before the ackDeadline and the message gets pulled again by pod 2, but pod 1 and pod 2 get the same message within milliseconds.

How do I set up my subscription / kubernetes so that only one pod will receive and process a message at a time?


Solution

  • Update: As of today (20/02/2024) Pub/Sub has QoS 2 (once and only once) implemented, so that would fix this.

    Old Response: There is no QOS 2 (deliver once and only once) at this time for Pub/Sub, so there's no way out of the box to do what you're talking about unfortunately.

    Official documentation on this is here: https://cloud.google.com/pubsub/docs/subscriber

    The suggested way to do this is to use Apache Beam's model instead where it uses time windowing to ensure what you're talking about. Google has Dataflow, which is hosted Apache Beam as an option.

    The other way (big maybe) to do this, which you could implement on your own, is to have a variable somewhere in memory (perhaps in Memorystore that you access from Kubernetes) which tracks the last ack'd timestamp, and use ordered list from Pub/Sub. So each pod would fetch (in order) from Pub/Sub, and you'd rely on your variable in shared memory for the last ack'd message, rather than Pub/Sub's mechanism. Obviously you'll still want to ack each message as you handle it in K8s, but now you'd also set the variable in Memorystore.

    I BELIEVE that should shrink the window for re-entrancy problems, but it's still there (latency between accessing the variable, and having it set is still big enough that you could double-process a message).

    This is why the recommended way is windowing via Beam. Dataflow also scales up arbitrarily large, so from a performance standpoint, it would be the same as Kubernetes likely, but it's Beam, so different tech stack to learn, and it's not simple, but it's the tool for this particular job.