parallel-processingapache-stormexecutors

Executors in Storm


I have a question related to Storm Functionality. Suppose I have a spout which is reading a csv file and emits records chunk by chunk. That is, it emits 100 records at a time to the bolt.

My question is that whether a single chunk when received by the bolt will be sent to only one executor or will be divided among different executors for the sake of parallelism.

Note : The bolt has 5 executors.


Solution

  • What do you mean by "it emits 100 record at a time"? Does it mean, that a single tuple contains 100 CSV lines? Or do you emit 100 tuples (each containing a single CSV line) in a single nextTuple() call.

    1. For the first case, Storm cannot parallelize those 100 line within a single tuple. Storm can only send different tuples to different executors.
    2. For the second case, Storm will send the 100 tuples to different executors (of course, depending on the connection pattern you have chosen).

    One side remark: it is considered bad practice to emit multiple tuples in a single call to nextTuple(). If nextTuple() blocks for any reason, the spout thread is blocked and cannot (for example) react on incoming acks. Best practice is, to emit a single tuple for each call to nextTuple(). If no tuple is available to be emitted, you should return (without emitting) and not block, to wait until a tuple is available.