wcf.net-4.0ws-reliablemessaging

WCF Reliable Messaging: stuttering service after maxPendingChannels increase


We have an issue whereby during load testing if we fire calls rapidly at one of our services we were getting the error

"System.ServiceModel.ServerTooBusyException: The request to create a reliable session has been refused by the RM Destination. Server 'net.tcp://localhost:10511/ParameterMonitorService' is too busy to process this request. Try again later. The channel could not be opened."

We increased the value of maxPendingChannels from its default of 4 to 128 and then beyond, and the error has disappeared, now however, rather than throwing the exception the service will just stop processing messages under load and then begin again several minutes later.

It does not seem to drop anything, it just hangs for a while. The more we pound the service the longer this recovery seems to take.

The service is configured as Per-Call with ConcurrencyMode Multiple. Other behavior settings are:

<serviceThrottling maxConcurrentCalls="100" maxConcurrentSessions="100" maxConcurrentInstances="100"/>

<customBinding>

    <binding name="Services_Custom_Binding" openTimeout="00:00:20" sendTimeout="00:01:00">          
        <reliableSession  ordered="true" inactivityTimeout="00:10:00" maxPendingChannels="128" flowControlEnabled="true" />
        <binaryMessageEncoding>
          <readerQuotas maxDepth="32" maxStringContentLength="8192" maxArrayLength="16384"
            maxBytesPerRead="4096" maxNameTableCharCount="16384" />            
        </binaryMessageEncoding>
        <tcpTransport maxPendingConnections="100" listenBacklog="100" />          
      </binding>
  </customBinding>

We are kind of stuck. Any help appreciated!


Solution

  • This is a classic performance tuning story. By reconfiguring the throttle on reliable sessions you have removed what used to be the bottleneck in the system, and have moved the bottleneck to somewhere else in your system.

    You really can't expect people to pluck a diagnosis of where the bottleneck now lies out of thin air, without any details of how your service is hosted, on what hardware, what it is doing, or how it goes about doing it. You need to instrument your system as comprehensively as you can, using Windows Performance Monitor counters, and interpret these to get an idea of where resource contention is now happening in the system.

    My first guess would be that the increased concurrency after removing the session throttle is causing contention for managed thread pool threads, but this is only a guess - really you want to base diagnosis on evidence, not guesswork.