apache-kafkaapache-kafka-streams

How to define a time window which should advance at specific time of the day?


I am trying to using Kafka stream to make a statistic of unique visitor for a specific time range(24h), for example,

2020-07-03 22:00:00 ~ 2020-07-04 21:59:59 (24 hours) ,the time window should advance automatically at 2020-07-04 22:00:00.

According to the document ,Tumbling time windows seems like a considerable choice:

Duration windowSizeDuration = Duration.ofDays(1);
TimeWindows timeWindows = TimeWindows.of(windowSizeDuration);

however I can`t find any available stream api to limit the time range, can somebody give me some advice? Thanks


Solution

  • Tumbling windows are aligned to the epoch, meaning Unix timestamps in UTC timezone, ie, a 24h window starts at midnight and ends at midnight UTC, as explained in the docs: https://kafka.apache.org/25/documentation/streams/developer-guide/dsl-api.html#tumbling-time-windows

    You can either "shift" the timestamps of your event by using a custom timestamp extractor or by using process() (eg, context.forward(record.withTimestamp(...))), for Kafka Streams version 4.0+.

    For older Kafka Streams versions, you can use transform() instead (eg, context.forward(..., To.all().withTimestamp(..)).

    Or you define a custom TimeWindow yourself. You can find an example on Github: https://github.com/confluentinc/kafka-streams-examples/blob/5.5.0-post/src/test/java/io/confluent/examples/streams/window/DailyTimeWindows.java