azureazure-functionsazure-functions-core-tools

batchSize in host.json ignored locally


I'm setting a batchSize of 1 in host.json, but when I run the function locally (QueueTrigger input), I get many messages from my queue when it should be only 1.

host.json:

{
  "version": "2.0",
  "extensions": {
    "queues": {
      "maxPollingInterval": "00:00:02",
      "visibilityTimeout": "00:00:30",
      "batchSize": 1,
      "maxDequeueCount": 5,
      "newBatchThreshold": 8,
      "messageEncoding": "base64"
    }
  }
}

Even in the console window logs it says "BatchSize": 1.

Why is this setting being ignored? As far as I understand, it should apply locally and when deployed.

When I delete the host.json settings and override the batchSize value in local.settings.json then it works:

{
  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "DefaultEndpointsProtocol=https;AccountName={MyDevAccount};AccountKey={MyAccKey};EndpointSuffix=core.windows.net",
    "FUNCTIONS_WORKER_RUNTIME": "dotnet-isolated",
    "AzureFunctionsJobHost__extensions__queues__batchSize": 1
  }
}

Now I only get 1 message even though my storage queue has many messages.

But when I define batchSize in host.json again, it even ignores the override in local.settings.json.

Is this a bug, or what am I missing? What if I want to customize my batchSize for the cloud, but keep it 1 locally?


Solution

  • Quick answer

    Remove the newBatchThreshold from your host.json.

    Parallelisation for queue triggers in a given instance is a combination of batchSize and newBatchThreshold together.

    You'll notice that you haven't got that setting in your local.settings.json which explains the difference in behaviour.

    More information

    The logic is more complex than just the batch size. Instead, it looks for if the queue processor is about to run out of messages, and then fetches another batch of messages. There's more info in the source code^1 which has this comment:

    The job keeps requesting messages in batches of BatchSize size until number of messages currently being processed is above NewBatchThreshold

    So in your host.json example, which has newBatchThreshold as 8, it will make quick retrievals of 1 message - so effectively fetching up to 8 messages, which is probably what you're observing.

    Default value of newBatchThreshold

    Looking at the source again^1, the default implementation if you don't specify newBatchThreshold explicitly is:

    return (_batchSize / 2) * _processorCount;
    

    If your batch size is 1, therefore 1/2 in integer maths, I figure the result will be zero. So it won't fetch more messages and will behave as you originally expected. This is why your local.settings.json did what you expected.

    Further thoughts: processing 1 message at a time

    This doesn't relate to your question specifically, but just to note that batch size of 1 (even disregarding the new batch threshold) must never be relied upon for processing 1 message at a time. One major reason for this is that the batch size only applies to a single instance of the application. You'll need some other concurrency controls. In your case, I think you are only using a batch size of 1 for local debugging, so I guess you're not worried about this.