apache-stormspout

Is there a method to control the number of spouts in apache storm?


On declaration of a topology in Apache Storm, is there a way to control how many instances per machine are used?

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("myspout", new MyDoSomethingOnTheHost(), 5);  

In the examples and the API documentation, the limit seems to be only a per topology count, but for my desired case, I would like to assure that the spouts do a task on each host and each host exactly.

As far as I understand it, even in the case of 5 machines in the above example, there seems to be no way to control how many spouts are launched per machine and in the worst case, all 5 spouts would be executed on one host.


Solution

  • The straight answer to the topic - is "No". But here are some workarounds:

    1. The number of spouts is set when you start topology. There is no way to change it programmatically after the start. You can stop topology by "kill" and restart it again with a new config. This can be done with the cmd line storm kill topology-name [-w wait-time-secs] and controlled outside your topologies.

    2. You can start more spouts when needed at the start and use the deactivate() and activate() methods of ISpout interface. This can be controlled programmatically by topology itself. The Storm does not use deactivated spouts and doesn't send nextTuple() requests to them, spouts remain in memory and can be activated instantly. This works great for the cold start of a large topology with a hundred workers.

    To my best experience, spouts aren't bottlenecks, it's always bolts, especially in large and indirect topologies. Someone suddenly died without handling the exception, someone is working slowly because the source is not responding, and you see a performance graph in the form of a hairbrush with an interval between teeth equal to topology.message.timeout.secs.