I was once asked on an interview, how would you deal with messages coming in out of order in a message queue. It has been a while and I have not found a definitive answer and I was wondering if an expert in the field can help me answer it to address my own curiosity.
I understand that some message queues provide exactly-once and FIFO guarantees. Also I am aware of the notion of event time and processing time in streaming systems. For instance, in log based message queues like Kafka, mixed up ordering may be less likely to happen due to the presence of offsets and message durability (I may be wrong). I have also thought about using timestamps requiring each message sender to record the time of message before sending it but this is fraught with inconsistency due to clock skew.
Given all of that, I am wondering how can one address mixed up ordering in a traditional messaging system like AMQP, JMS or RabbitMQ where a dozen of IOT devices may be sending messages and I as a consumer want to reconcile them in the correct order.
If queue your system is using, provides ordered message guarantee, then simply use that channel(like kakfa's single partition, AMQP under some settings). But if queue your system is using does not provide strict ordering then general Idea is that client can have monotonically increasing[1] number(or timestamp) attached with each message it sends to queue. This forms the basis of sequence which producer intends to send to its receivers.
How to get montonically increasing value:
Using timestamp: POSIX clock_gettime() function with CLOCK_MONOTONIC[2] provides option to get monotonically increasing timestamp, which can be used by producer to put timestamp on each message. Receiver can identify out of order messages when it sees that received message has timestamp older than latest message.
Using sequence number: Before sending each message you can simply increase an atomic counter and attach counter value to each message, so that receiver can know about intended ordering. This will form strictly increasing sequence. Approach is very similar to Lamport's logical clock[3] which provides virtual clock for producer.
Dealing with out of order messages on receiver side: This is pretty much application specific but in general you have 2 options when messages arrive out of order: a) discard the older message, like in cases in which receiver have to show latest value of a stock. b) Have buffer to reorder sequencing, like within a TCP connection(e.g. zookeeper uses TCP as queue for FIFO ordering [4-5])
Tools: If you are not adding timestamp to messages, then send all messages to Apache kafka single partition in sequence from producer, as this will ensure that receiver can receive messages in sequence.
If you are using messaging system which does not guarantee ordered delivery (like AMQP under some settings[6]), then you can consider adding additional monotonically increasing number/clock with each message.
[2] https://linux.die.net/man/2/clock_gettime
[3] https://en.wikipedia.org/wiki/Lamport_timestamps#Lamport's_logical_clock_in_distributed_systems
[5] http://www.tcs.hut.fi/Studies/T-79.5001/reports/2012-deSouzaMedeiros.pdf