azureazure-functionsazure-servicebus-queues

Azure service bus trigger function app stops monitoring its queue


The problem we are experiencing is a service bus queue trigger function app, on a rare occasion (99.999% uptime), stops performing its job. The monitor aspect of the function app just stops working. The function app shows as running. We have found no errors to explain why the function app does not recognize new messages in the service bus queue (function app logs, application insights, service bus logs, etc.). Restarting the function app processes the messages in the queue.

We have seen this behavior in both our production and Uat/testing environments for ~2 of our service bus trigger function apps; however, we have other function apps that are the same trigger type that have yet to exhibit this behavior. The only difference between the two environments is that we use a premium service bus for production.

So, the million dollar question is why the function apps stop seeing new messages in the queues they are monitoring, until being restarted, given that they are not on a consumption plan and they are configured to always be on?

Production: Function App: Runtime Version: ~4 .Net Version: .NET 6 (LTS) Isolated (I know, we are going to 8 soon. :) ) Type - Service Bus Trigger Always On Setting - True Number of functions - 1 Storage account - specific only to the function app.

App Service Plan:
    Type -  P2v3
    # of Apps - 27

Service Bus:
    Type - Premium

Queue monitored:
    Session enabled: false

Current ticket exists with Microsoft, active for the last two days, but they do not have a solution at this point. Interaction with their support team members, thus far, have confirmed that our setup is coded and implemented/configured correctly.


Solution

  • Issue:

    Why a problem occurred:

    Mitigation:

    Running with at least two instances of the app service plan:

    Using Azure Health Check:

    Code Solution:

    Azure Solution:

    Conclusion:

    So, in essence, our problem was that an update occurred via Microsoft maintenance, but our environment was not capable of dealing with whether a problem arose due to the update. Now, there is no way to always be 100% accurate in dealing with updates, but, by having at least two instances active we should be able to eliminate future problems, relative to maintenance updates. And, I am exploring using an Azure Monitor alert to inform us if messages stay in the queue longer than what we would expect. I'll explore the code solution if using Azure Monitor does not work for our case.

    Shout-out/Response to others:

    Finally, thank you Vivek, for your suggestions. In this case, a timer trigger, just to see if the queue monitoring function app was idle/not running, would not work because the function app is set to always be running, and it was active, but it just lost the ability to see new messages in the queue for processing.