[SOLVED] When Hangfire deployed in Azure webjob , multiple webjobs instances picks same hangfire job

When Hangfire deployed in Azure webjob , multiple webjobs instances picks same hangfire job

When Hangfire deployed in Azure webjob , multiple webjobs instances picks same jobs
I am using hangfire 1.6.21 version
I have deployed Hangfire in Azure webjob
Using Azure SQL server as Database for Hangfire
When I run below query on Hangfire.State table , I see multiple records for "In Progress" state
This happens in 3 cases , when Azure instances scaled down or it becomes un-available or if same job is picked by 2 instances simultaneously.
All above scenarios are intermittent and happens for less than 5% of the jobs.
Is there any quick code fix or workaround ?
Upgrading to latest hangfire or Using combination of Service bus and Azure function will take time

Select jobid, count (servername) from [HangFire].[State] with(nolock) where [JobId] in (1,2,3,)
group by jobid having count(servername)>1

We setup fixed scaling in Azure webjob to fix the issue

Solution

This issue happens mainly during scale-out and scale-in. In scale-out, multiple WebJob instances may pick the same Hangfire job due to timing issues in how Hangfire locks jobs with SQL Server.

In scale-in, if an instance is removed while running a job, that job can restart on another instance, leading to duplicates or delays. Always On helps keep the app running but doesn’t stop this behavior during scaling.

To Resolve the issue,

Ensure only one background worker runs per instance to avoid job duplication,

Limit Worker Count to 1 per Instance

var options = new BackgroundJobServerOptions
{
   WorkerCount = 1
};
app.UseHangfireServer(options);

Use [DisableConcurrentExecution] Attribute, this will make the long-running or sensitive job methods to prevent parallel execution of the same job.

[DisableConcurrentExecution(timeoutInSeconds: 300)]
public void ProcessJob()
{
    
}

Use a database or cache to record jobs that have already run, and check this at the start of each job to avoid running it again.
Log which instance picks the job, along with timestamps, to better understand concurrency behavior.

Console.WriteLine($"Job started by {Environment.MachineName} at {DateTime.UtcNow}");

If possible, upgrade to Hangfire 1.7+ for better handling of job locking with SQL Server. Or consider using Durable Functions or Azure Service Bus with Azure Functions for more reliable job processing across multiple instances.