apache-flinkflink-streamingamazon-kinesisamazon-kinesis-firehosedata-stream

Reading from a Kinesis data stream (or any data stream) in two different tumbling time windows


I have a Kinesis data stream, and I am consuming the data using tumbling windows. I have two use cases one is consuming the data in a 5-minute tumbling window, and the other is a 1-minute tumbling window. My understanding is that the data in the data stream can only be read once, so it is not possible to consume the same data record simultaneously in the data stream by two different consumers. Is that accurate?


Solution

  • I'm not familiar with the Kinesis source, but with other sources you can have two (or more) sources independently consuming from the same resource.

    However, that's not the best way to do this. Instead, you can fork the stream like this:

    events = env.fromSource(...);
    
    events.keyBy(...).window(1 min).sinkTo(...);
    events.keyBy(...).window(5 min).sinkTo(...);
    

    Or you could consider forking the stream after the 1 minute windows:

    resultsFrom1Min = events.keyBy(...).window(1 min);
    
    resultsFrom1Min.sinkTo(...);
    resultsFrom1Min.keyBy(...).window(5 min).sinkTo(...);