amazon-web-servicesaws-lambdaamazon-sqsaws-sqs-fifo

AWS SQS.FIFO reads the first 20,000 messages to determine message groups, what is the order of reading?


What mechanism does SQS.FIFO use to read the first 20,000 messages?

Here's an example for some context:

On a FIFO queue, we have message groups:

| Group  | Number of messages | 
| A      | 50,000             |
| B      | 100                |
| C      | 5                  |

Timing of messages received on the queue:

Group A added at a rate of 100 per second from 11:00:00 onwards  
Group B added at a rate of 10 per second from 11:00:01 onwards
Group C added at 11:05:00

No delivery delays are applied to any of the messages. The queue is configured with visibility timeout to match a lambda consumer that will be added later. The queue isn't being processed by anything yet.

Later on, a lambda function is configured with the above queue as an event source with 3 maximum batch size of 5 and long polling of 2 seconds. The lambda function takes 1 minute to process the events.

What would the first few batches contain?

| batch | messages |  consumer |
|  1    | AAAAA    | lambda1   |
|  2    | AAAAA    | lambda1   |
|  3    | AAAAA    | lambda1   |
|  4    | BBBBB    | lambda2 ? | 

The above model is what I expect to see if SQS.FIFO reads the messages ordered by time across all message groups. The alternative is that SQS.FIFO keeps reading from message group A until the total number of messages on the queue is down to <20,000

Could someone shed some light on the reading mechanism?

As stated in the docs:

AWS SQS.FIFO queue looks through the first 20k messages to determine available message groups. This means that if you have a backlog of messages in a single message group, you can't consume messages from other message groups that were sent to the queue at a later time until you successfully consume the messages from the backlog. enter
image description here


Solution

  • SQS.FIFO reads the first 20,000 messages across all message groups, ordered by time of receipt.

    I created an experiment with 3 message groups adding respectively 21k, 1K and 21K in each and sending them in the order of listing above. The queue was processed by a lambda function, with a max size batch of 10 messages. I introduced a delay of 1s to the lambda function.

    The total queue size of available messages was 42k. For the first 1000 messages the queue only had 10 messages in flight. Then when the queue drooped to <41k I could see 20 messages in flight. This remained so until the queue drained. Here is my mental model of what happened in that queue. The three message groups are represented with blue, green and red bars.

    enter image description here