azureazure-webjobswebjobazure-webjobs-continuous

Azure WebJob is failing after some period of time on Linux plan


Issue:

After a (continuous, "Always On" feature is enabled for App Service) webjob is deployed (to webapp on Linux plan) it works for a random number of days (sometimes 2-3 days, sometimes even 1 week) and after that it partially (what "partially" means see in description) fails and can't be restored without redeploy.

Full description:

  1. A simple app (.NET 8.0) reads messages from Azure Service Bus (just a queue on "basic" azure bus tier) implemented as a WebJob (singleton, website manually set to scale out just to 1 instance). The app waits for new messages in the queue and processes them in near real-time.
  2. I've chosen "Linux" plan vs "Windows" plan because of the cost, the "Basic" web app tier. Deployed using Azure DevOps pipelines. All good. Everything was working for a day or two.
  3. I noticed that Service Bus was accumulating messages and nothing processed them. Checked the status of the webjob, it was "InactiveInstance". Odd and you can't restart it, the only option was delete and redeploy. I redeployed. Same story again after a couple of days.
  4. Worth mentioning that webjob feature for Linux plan was in Preview then (April 2024) - in Azure portal webjob had a "Preview" word after a feature name. So I decided to switch to Windows plan and test everything there. In short - No issues on Windows so far (webjob is up for ~2 months, no weird behavior)
  5. In June I stumbled upon a blog post from Microsoft that webapp for Linux is GA. "Preview" addition indeed disappeared from feature name, so I tried again. Deployed the app again on Linux and, additionally, implemented a custom webjob health check. This custom check, implemented as Azure Function, sends every 10 mins an HTTP request to Kudu api (there is a way to list all jobs for a site and to get a specific job as well). I know the same we can do with Azure Management API (more generic one), just decided to use Kudu. In case the webjob is not available the Function app sends an email with alert, so I don't need to check webjob status manually.
  6. The webjob again had been working for almost a week and failed again. Below is more detailed description what I noticed.
  7. Function app at some point started signaling/emailing that my webjob is unavailable - all HTTP requests to Kudu api were failing with 504 (Gateway timeout). The same story with (more generic) Azure Management API - 504.
  8. I checked the webjob in the Azure portal. Weird. Job was not there - refreshed and checked HTTP traffic using browser developer tools for webjob page - same issue, portal was sending HTTP request to get the list of jobs for my website but was receiving same 504 error.
  9. The most weird thing that the webjob WAS STILL WORKING/ACTIVE. It was still consuming messages from Azure Service Bus AND was successfully processing them (these messages should appear as records, created via api, in another custom system that has UI, so I could verify that messages were still arriving to this custom system). Weird!
  10. I kept the job (in this kind of invalid state) working for 1 day - nothing changed, still not available in the list of jobs, however it was still processing messages from Service Bus.
  11. I restarted the app service (site) and only after that (at least I believe so) the job appeared in the list again (in Portal) with "InactiveInstance" state and stopped consuming messages from Service Bus. There is nothing you can do with the job in such state - just delete and redeploy.
  12. If I redeploy the story repeats - the job works up to 7 days and "fails" again.

I am happy to provide more details if needed, but it looks like a bug specifically when deploying to Linux plan.

Short log from Kudu:

[06/16/2025 00:40:47 > 985515: SYS INFO] WebJob is still running
[06/16/2025 09:19:21 > 985515: SYS INFO] Status changed to Starting
[06/16/2025 09:19:22 > 985515: SYS INFO] WebJob singleton setting is True
[06/16/2025 09:19:26 > 985515: SYS INFO] Status changed to InactiveInstance

Below is also a screenshot of an api call from the MS docs to get webjob info for my Subscription and site - 504 as a result. enter image description here

Thank you all!


Solution

  • Although the "Always On" feature is available in the Linux Basic tier, its behavior is unpredictable for continuous WebJobs. Unlike Windows, Linux App Service Plans do not officially support continuous WebJobs, and background tasks may stop running due to app unloading during idle time, even with "Always On" enabled.

    Since you're using a continuous WebJob on a Linux App Service Plan, it needs "Always On" to run reliably. Without it, the job may stop or behave unpredictably due to the app being unloaded during idle time.

    Please refer this Msdoc to know more about the WebJobs.

    To resolve the issue,

    Refer this Msdoc to know about Azure App Service Plans Tiers and Features.