I have a use-case wherein I want to read the messages in Pubsub without acknowledging the messages. I would need help in on how to rule out the possibility of "duplicate messages" which will remain in Pubsub store when I don't ACK the delivered message.
Solutions that I have thought of:
I see that there is no offset like feature in Pubsub which is similar to Kafka, I think.
Which is the best approach that you would suggest in this matter/ or any other alternative approach that I can use?
I am using python google-cloud-pubsub_v1 to create a python client and pulling messages from Pubsub.
I am sharing the code which is the logic to pull the data
subscription_path = subscriber.subscription_path(
project_id, subscription_name)
NUM_MESSAGES = 3
# The subscriber pulls a specific number of messages.
response = subscriber.pull(subscription_path, max_messages=NUM_MESSAGES)
for received_message in response.received_messages:
print(received_message.message.data)
It sounds like Pub/Sub is probably not the right tool for the job. It seems as if you are trying to use Pub/Sub as a persistent data store, which is not the intended use case. Acking is a fundamental part of the lifecycle of a Cloud Pub/Sub message. Pub/Sub messages are deleted if they are unacked after the provided message retention period, which cannot be longer than 7 days.
I would suggest instead that you consider using an SQL database like Cloud Spanner. Then, you could generate a uuid for each message, use this as the primary key for deduplication, and transactionally update the database to ensure there are no duplicates.
I might be able to provide a better answer for you if you provide more information about what you are planning on doing with the deduplicated messages.