wcf.net-coreazure-functionswcf-clientazure-durable-functions

Azure Functions Consumption Plan WCF Client Errors "attempt failed because the connected..."


We recently switched our azure functions durable functions based app from a dedicated s1/standard app service plan to dynamic y1 plan to same money and now we are getting a common error:

"A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond."

this happens after about an hour of the app running. The exceptions comes from a svcutil generated wcf client. I'm fairly certain this is related to the limitation of socket connections from a consumption function app vs a "dedicated" app plan as described at https://learn.microsoft.com/en-us/azure/azure-functions/functions-scale#service-limits but not totally convinced because i do NOT see the log message "Host thresholds exceeded: Connections" listed at https://learn.microsoft.com/en-us/azure/azure-functions/manage-connections#connection-limit

our client is actually a wrapper around a dozen wcf clients instantiated on our wrappers construction. the wrapper is registed with di as a singleton

builder.Services.AddSingleton<IWrapperClient, OurSoapClient>();

public OurSoapClient(
            IMemoryCache memoryCache,
            IOptions<Options> options,
            ILogger<OurSoapClient> log
        )
        {
            this.options = options.Value;
            this.memoryCache = memoryCache;
            this.log = log;


            this.metaClient = new Meta.MetaWebServiceClient(
                Meta.MetaWebServiceClient.EndpointConfiguration.MetaWebServicePort,
                this.options.MetaHref
            );
            

            this.wmsClient = new Wms.WmsWebServiceClient(
                Wms.WmsWebServiceClient.EndpointConfiguration.WmsWebServicePort,
                this.options.WmsHref
            );

            this.wmsStageItemsClient = new Wms.Stage.Items.WmsWebServiceClient(
                Wms.Stage.Items.WmsWebServiceClient.EndpointConfiguration.WmsWebServicePort,
                this.options.WmsHref
            );

            this.wmsReceiptClient = new Wms.Stage.ExpectedReceipts.WmsWebServiceClient(
                Wms.Stage.ExpectedReceipts.WmsWebServiceClient.EndpointConfiguration.WmsWebServicePort,
                this.options.WmsHref
            );

            this.wmsStageRmaClient = new Wms.Stage.Rma.WmsWebServiceClient(
                Wms.Stage.Rma.WmsWebServiceClient.EndpointConfiguration.WmsWebServicePort,
                this.options.WmsHref
            );

            this.wmsStageShipmentsClient = new Wms.Stage.Shipments.WmsWebServiceClient(
                Wms.Stage.Shipments.WmsWebServiceClient.EndpointConfiguration.WmsWebServicePort,
                this.options.WmsHref
            );


            this.wmsUpdateShipmentsClient = new Wms.Updates.ShippingResults.WmsWebServiceClient(
                Wms.Updates.ShippingResults.WmsWebServiceClient.EndpointConfiguration.WmsWebServicePort,
                this.options.WmsHref
            );

            this.wmsUpdatesReceivingResultsClient = new Wms.Updates.ReceivingResults.WmsWebServiceClient(
                Wms.Updates.ReceivingResults.WmsWebServiceClient.EndpointConfiguration.WmsWebServicePort,
                this.options.WmsHref
            );

            this.wmsUpdatesInventoryAdjustmentClient = new Wms.Updates.InventoryAdjustments.WmsWebServiceClient(
                Wms.Updates.InventoryAdjustments.WmsWebServiceClient.EndpointConfiguration.WmsWebServicePort,
                this.options.WmsHref
            );

            this.wmsInboundOrderClient = new Wms.Inbound.CurrentAndHistory.WmsWebServiceClient(
                Wms.Inbound.CurrentAndHistory.WmsWebServiceClient.EndpointConfiguration.WmsWebServicePort,
                this.options.WmsHref
            );

            this.wmsOutboundOrderClient = new Wms.Outbound.CurrentAndHistory.WmsWebServiceClient(
                Wms.Outbound.CurrentAndHistory.WmsWebServiceClient.EndpointConfiguration.WmsWebServicePort,
                this.options.WmsHref
            );

            this.wmsInboundOrderDetailsClient = new Wms.Inbound.CurrentAndHistoryDetails.WmsWebServiceClient(
                Wms.Inbound.CurrentAndHistoryDetails.WmsWebServiceClient.EndpointConfiguration.WmsWebServicePort,
                this.options.WmsHref
            );

            this.wmsOutboundOrderDetailsClient = new Wms.Outbound.CurrentAndHistoryDetails.WmsWebServiceClient(
                Wms.Outbound.CurrentAndHistoryDetails.WmsWebServiceClient.EndpointConfiguration.WmsWebServicePort,
                this.options.WmsHref
            );
        }

switching back to standard app service plan seems to make this go away. i'm fairly certain durable functions isn't a cause here, but just to be clear all the calls to the client happen from Orchestrator or Activity functions...we see the same failure errors in both function types.

One anecdote i've noticed repeated is the errors seem to occur just after a second OurWrapperClient is instantiated (which instantiates all the wcf clients again)...since it's a singleton this must be the azure functions control plane spinning up another instance of my app

so a couple of questions:

  1. any idea how to prove this is max outbound connections related issue?
  2. any suggestions for reasons why this becomes a problem
  3. assuming this is related to WCF
    1. what's the correct way to use wcf clients, should they be instantiated for each call with usings, or is it ok to instantiate them once per wrapper client as we have and then dispose them only once?
    2. should we instantiate them as singletons with the DI and then inject them instead? This means DI would call Dispose on them i believe
    3. is there any way to pass the http client to be used to the wcf client generated code? a lot of the azure functions best practices say to have a single injected http client for all your http I/O, but i don't see how to do that with WCF.

Solution

  • Using app insights i noticed that the takes about an hour thing corresponded to my app switching host instances around that time. eventually i started to see that on deploys it would fail right away..ie got a "bad" host. opened up a MS support case they remoted into a bad isntance and found they could not TCP ping from that host.

    Each webspace you are assigned makes requests from a pool of IPs, i suspect my targets WAF was blocking some of these IPs for whatever reason. Switching to a new region which guaranteed a new webspace (they're assigned on created, but are region specific) made the problem go away.

    did find https://github.com/dotnet/runtime/issues/35508 during this which seemed similar