I am confused if TUMBLE window will get calculated on regular interval and emit the elements for processing. example I have a query that is expected to work on interval 10 second.
select id, key from eventTable GROUP BY TUMBLE(rowTime, INTERVAL '10' SECOND), id, key ;
Now let's say: application receive event
As you can see E1 and E2 are reached within 5 sec and E3 reached at @12:00:15.
If you are using event time processing, then the window that ends at 10:00:10 will be emitted when the watermark passes 10:00:10. If the watermarking is done in the usual bounded-out-of-orderness fashion, and if there are no other events, then the watermark won't advance until E3 is processed.
If you require a watermarking strategy that takes idleness into account, I believe your only option is to use the DataStream API to create the stream and apply watermarking that deals with idle sources, and then convert the DataStream to a Table.
Note that what .withIdleness(...)
does is to mark a stream as idle, which keeps that stream from holding back the watermark. This solves the problem of one idle stream holding back the entire job if there are other, active streams. If you want the watermark to progress when absolutely nothing is happening, you'll need to do something more drastic.
The ideal solution is to have keepalive messages that originate from the same source, so that you know that the idleness is genuine, rather than an outage. Failing that, see ProcessingTimeTrailingBoundedOutOfOrdernessTimestampExtractor for an example of how to use a timer to detect idleness and advance the watermark based on the passage of time, rather than the arrival of new events. (Note that this example has not been updated to use the new WatermarkStrategy
interface.)