workflowscalabilitynservicebuspipelinenservicebus-distributor

NServiceBus pipeline with Distributors


I'm building a processing pipeline with NServiceBus but I'm having trouble with the configuration of the distributors in order to make each step in the process scalable. Here's some info:

Hopefully these assumptions are good otherwise I'm in more trouble than I thought.

For the sake of simplicity, let's forget about forking or joining and consider a simple pipeline, with Step A followed by Step B, and ending with Step C. Each step gets its own distributor and can have many nodes processing messages.

Here are the relevant parts of the config files, where # denotes the number of the worker, (i.e. there are input queues NodeA.1 and NodeA.2):

NodeA:
<MsmqTransportConfig InputQueue="NodeA.#" ErrorQueue="error" NumberOfWorkerThreads="1" MaxRetries="5" />
<UnicastBusConfig DistributorControlAddress="NodeA.Distrib.Control" DistributorDataAddress="NodeA.Distrib.Data" >
    <MessageEndpointMappings>
    </MessageEndpointMappings>
</UnicastBusConfig>

NodeB:
<MsmqTransportConfig InputQueue="NodeB.#" ErrorQueue="error" NumberOfWorkerThreads="1" MaxRetries="5" />
<UnicastBusConfig DistributorControlAddress="NodeB.Distrib.Control" DistributorDataAddress="NodeB.Distrib.Data" >
    <MessageEndpointMappings>
        <add Messages="Messages.EventA, Messages" Endpoint="NodeA.Distrib.Data" />
    </MessageEndpointMappings>
</UnicastBusConfig>

NodeC:
<MsmqTransportConfig InputQueue="NodeC.#" ErrorQueue="error" NumberOfWorkerThreads="1" MaxRetries="5" />
<UnicastBusConfig DistributorControlAddress="NodeC.Distrib.Control" DistributorDataAddress="NodeC.Distrib.Data" >
    <MessageEndpointMappings>
        <add Messages="Messages.EventB, Messages" Endpoint="NodeB.Distrib.Data" />
    </MessageEndpointMappings>
</UnicastBusConfig>

And here are the relevant parts of the distributor configs:

Distributor A:
<add key="DataInputQueue" value="NodeA.Distrib.Data"/>
<add key="ControlInputQueue" value="NodeA.Distrib.Control"/>
<add key="StorageQueue" value="NodeA.Distrib.Storage"/>

Distributor B:
<add key="DataInputQueue" value="NodeB.Distrib.Data"/>
<add key="ControlInputQueue" value="NodeB.Distrib.Control"/>
<add key="StorageQueue" value="NodeB.Distrib.Storage"/>

Distributor C:
<add key="DataInputQueue" value="NodeC.Distrib.Data"/>
<add key="ControlInputQueue" value="NodeC.Distrib.Control"/>
<add key="StorageQueue" value="NodeC.Distrib.Storage"/>

I'm testing using 2 instances of each node, and the problem seems to come up in the middle at Node B. There are basically 2 things that might happen:

  1. Both instances of Node B report that it is subscribing to EventA, and also that NodeC.Distrib.Data@MYCOMPUTER is subscribing to the EventB that Node B publishes. In this case, everything works great.
  2. Both instances of Node B report that it is subscribing to EventA, however, one worker says NodeC.Distrib.Data@MYCOMPUTER is subscribing TWICE, while the other worker does not mention it.

In the second case, which seem to be controlled only by the way the distributor routes the subscription messages, if the "overachiever" node processes an EventA, all is well. If the "underachiever" processes EventA, then the publish of EventB has no subscribers and the workflow dies.

So, my questions:

  1. Is this kind of setup possible?
  2. Is the configuration correct? It's hard to find any examples of configuration with distributors beyond a simple one-level publisher/2-worker setup.
  3. Would it make more sense to have one central broker process that does all the non-computationally-intensive traffic cop operations, and only sends messages to processes behind distributors when the task is long-running and must be load balanced?
    • Then the load-balanced nodes could simply reply back to the central broker, which seems easier.
    • On the other hand, that seems at odds with the decentralization that is NServiceBus's strength.
    • And if this is the answer, and the long running process's done event is a reply, how do you keep the Publish that enables later extensibility on published events?

Solution

  • The problem you have is that your nodes don't see each others list of subscribers. The reason you're having that problem is that your trying out a production scenario (scale-out) under the default NServiceBus profile (lite) which doesn't support scale-out but makes single-machine development very productive.

    To solve the problem, run the NServiceBus host using the production profile as described on this page:

    http://docs.particular.net/nservicebus/hosting/nservicebus-host/profiles

    That will let different nodes share the same list of subscribers.

    Other than that, your configuration is right on.