Assuming all events arrive on time and no lateness is allowed, how do I do some processing only when the session window has ended? I.e. the watermark has passed (lastEventInWindowTimestamp + inactivityGap
).
I couldn't find any API method that is called when this happens. Can I implement this logic using a custom ProcessWindowFunction
?
Yes, a ProcessWindowFunction
serves exactly this purpose. Such a function is called when the window is complete, and is passed (among other things) an Iterable containing the stream elements that have been assigned to the window. In the case of a session window, the ProcessWindowFunction
isn't called until after the period of inactivity has passed.
Update: How can you report both the start and end timestamps for each session window?
I will assume that you can extract the timestamp for each event from the event itself. Then, if you are using a ProcessWindowFunction
, you can iterate over the events in the window and determine the min and max timestamps for the events in the session -- these will be the start and end timestamps.
If, on the other hand, you would rather use a reduce function that incrementally computes the window results, you can work with tuples that track the (min, max) timestamps for each window.