amazon-sqsapache-storm

Apache Storm - use multiple spouts?


So I'm trying to configure my spout(s) to read from an Amazon SQS queue. Now, I want a situation wherein I can share the load across multiple spouts.

I understand it's possible to have multiple threads, but can I have two or more different spout instances/applications which are reading from the same queue and emitting to the same topology? For eg. Spout A and Spout B read from the SQS and then both emit to bolt C?


Solution

  • Of course, you can have multiple spouts, but you have to define them accordingly to prevent double submit of the same element (or your topology does accept that by design). Multiple processes of the same element imply bad counters for instance.

    Check Storm concurrency as a start with executors (threads) and tasks (instances) per spout / bolt and choose the number you want.

    In your code, you have to be sure that you don't manage the same tuples twice or more, either you do it before storm (a queue which doesn't accept the same element twice which is processed / emptied by many spouts for instance, or multiple queues - one for each spout, beware of transactions) or you do it in storm (process messages only with x param in one spout, with y in another and a message cannot be x and y at the same time).