[SOLVED] Does stream time in a windowed table advance per key or globally?

Does stream time in a windowed table advance per key or globally?

I have a tricky case where a lot of historical JSON data was written to Kafka with a custom partitioning strategy.

We are currently developing a system to make use of this historical data and perform various aggregations on top of it. Most of these are simple aggregations like SUM, COUNT, etc., but they all have a set window size of 1 hour and a grace period of 72 hours.

The problem is: the aggregations need to be done based on an ID which is not the key of the record, but part of its value, hence the topic will need to be repartitioned.

I am unable to find concrete documentation as to the expected behaviour of my windowed tables when this happens.

For example:

Record 1 @ TS 1, partition 0
Record 2 @ TS 2, partition 0
Record 3 @ TS 3, partition 0

Could ostensibly turn into:

Record 1 @ TS 1, partition 0
Record 2 @ TS 2, partition 2
Record 3 @ TS 3, partition 6

(or any other combination) after repartitioning on the given ID. Basically, the potential problem (?) arises because the total per-partition ordering is no longer guaranteed after repartitioning.

Other cases seem even worse, such that:

Record 1 @ TS 1, partition 0
Record 2 @ TS 2, partition 1
Record 3 @ TS 3, partition 2

might turn into:

Record 3 @ TS 3, partition 9
Record 1 @ TS 1, partition 9
Record 2 @ TS 2, partition 9

where the timestamps are not strictly increasing anymore within a single partition after repartitioning.

So, my questions are:

Is there any rule of thumb for repartitioning this kind of data safely? (probably not)
I assume that, if a KTable sees the above repartitioned stream, the global stream time is appropriately "broken". The questions is: does this affect my windowed aggregations, or does Kafka Streams advance the stream time per key in KTables? Worded another way: as long as the ordering within the partition of the same window key is guaranteed, will I still have correct results?

Solution

Stream-time is advance per task (ie, per partition), not per key (but also not globally).

The way to address the unorder in the repartition topic is to use grace-period. It should be large enough to encount for the unorder and to ensure that no window is closed before all input data is read from the repartition topic.

It's hard to say a-priori how large the grace period would need to be, but there is a "record dropped" metric that you can monitor to see if input data is dropped as "late" because it's window was already closed.