amazon-web-servicesaws-sdkamazon-kinesisdata-streamamazon-kcl

Kinesis Stream Consumption: LATEST v/s TRIM_HORIZON


I have a use case where I need to keep the Kinesis trigger on my consumer (let's call it Lambda B) be disabled, while the producer (let's call it Lambda A) writes to the Kinesis stream. Once the write is complete, I intend to enable the trigger, and Lambda B should be able to process the data present in Kinesis stream. With this situation in mind:

LATEST - This implies that records written to stream after enabling the Lambda trigger will be processed. Records written during the disabled phase will not be processed. Doesn't suit my use case. Discarded.

TRIM_HORIZON - All the records in the stream will be processed. Okay, this works for my use case. BUT I'm imagining a case where the trigger goes enabled(1) -> disabled -> enabled(2). After enabled(2), Lambda B will read records put in during the disabled state. That is fine. But will it also read records that were already read during the enabled(1) phase (since Kinesis retains records for 24 hours)? If so then this is an issue.

AT_TIMESTAMP - This requires manually putting in timestamps which I do not want to do. Discarded.


Solution

  • You only need to pick the initial starting point when you create the event source mapping. Once the mapping is created, and has processed messages, it will remember the last messages that it processed.

    So a sequence like this will give you what you want:

    1. Create Lambda B and event source mapping with TRIM_HORIZON (or LATEST; there aren't any messages in the stream, so they're functionally equivalent).
    2. Disable event source.
    3. Start Lambda A to write messages to stream.
    4. Enable event source. Lambda B will process the messages in the stream.
    5. Repeat steps 2-4 as needed.