I am trying to using Kafka stream to make a statistic of unique visitor for a specific time range(24h), for example,
2020-07-03 22:00:00 ~ 2020-07-04 21:59:59 (24 hours) ,the time window should advance automatically at 2020-07-04 22:00:00.
According to the document ,Tumbling time windows seems like a considerable choice:
Duration windowSizeDuration = Duration.ofDays(1);
TimeWindows timeWindows = TimeWindows.of(windowSizeDuration);
however I can`t find any available stream api to limit the time range, can somebody give me some advice? Thanks
Tumbling windows are aligned to the epoch, meaning Unix timestamps in UTC timezone, ie, a 24h window starts at midnight and ends at midnight UTC, as explained in the docs: https://kafka.apache.org/25/documentation/streams/developer-guide/dsl-api.html#tumbling-time-windows
You can either "shift" the timestamps of your event by using a custom timestamp extractor or by using process()
(eg, context.forward(record.withTimestamp(...))
), for Kafka Streams version 4.0+.
For older Kafka Streams versions, you can use
transform()
instead (eg,context.forward(..., To.all().withTimestamp(..))
.
Or you define a custom TimeWindow
yourself. You can find an example on Github: https://github.com/confluentinc/kafka-streams-examples/blob/5.5.0-post/src/test/java/io/confluent/examples/streams/window/DailyTimeWindows.java