apache-stormapache-storm-topologyapache-storm-configs

Apache storm tumbling window producing duplicate tuples in case of message timeout reached


I have used BaseWindowBolt of apache storm. However in case of message timeout reached i am seeing duplicate packets. For example below is my test configuration. I am using kafka spout to read data from a topic. I have my bolt with tumbling window size of 2 and message timeout secs as 30.

now i produced 1 message to the topic and waited for timeout to occur and then when i debugged my topology code by running my topology locally what i see is same packet is coming twice in tuple window object.

Is it desired behaviour or i am doing something wrong? Ideally storm should process only message that arrived in the window if timeout occurs i.e. 1.


Solution

  • I got the reason for the behaviour. Which is because message timeout reached and last message still not processed so storm will consider that message fail and will retry hence resulting in same packet included in tuple window. To fix this we can either have longer message timeout second or low window count.