asp.net-core .net-core quartz.net quartz

Quartz scheduler for long running tasks skips jobs

This is my job. It takes about 3 to 5 minutes to complete each time:

[DisallowConcurrentExecution]
[PersistJobDataAfterExecution]
public class UploadNumberData : IJob
{
    private readonly IServiceProvider serviceProvider;
    public UploadNumberData(IServiceProvider serviceProvider)
    {
        this.serviceProvider = serviceProvider;
    }

    public async Task Execute(IJobExecutionContext context)
    {
        var jobDataMap = context.MergedJobDataMap;
        string flattenedInput = jobDataMap.GetString("FlattenedInput");
        string applicationName = jobDataMap.GetString("ApplicationName");
        

        var parsedFlattenedInput = JsonSerializer.Deserialize<List<NumberDataUploadViewModel>>(flattenedInput);
        var parsedApplicationName = JsonSerializer.Deserialize<string>(applicationName);


        using (var scope = serviceProvider.CreateScope())
        {
            //Run Process
        }
    }
}

This is the function that calls the job:

try
{
    var flattenedInput = JsonSerializer.Serialize(Input.NumData);
    var triggerKey = Guid.NewGuid().ToString();
    IJobDetail job = JobBuilder.Create<UploadNumberData >()
       .UsingJobData("FlattenedInput", flattenedInput)
       .UsingJobData("ApplicationName", flattenedApplicationName)
       .StoreDurably()
       .WithIdentity("BatchNumberDataJob", $"GP_BatchNumberDataJob")
       .Build();
    await scheduler.AddJob(job, true);
    ITrigger trigger = TriggerBuilder.Create()
        .ForJob(job)
        .WithIdentity(triggerKey, $"GP_BatchNumberDataJob")
        .WithSimpleSchedule(x => x.WithMisfireHandlingInstructionFireNow())
        .StartNow()
        .Build();
    await scheduler.ScheduleJob(trigger);

}
catch(Exception e)
{
    //log
}

Each job consists of 300 rows of data with the total count being about 14000 rows divided into 47 jobs.

This is the configuration:

NameValueCollection quartzProperties = new NameValueCollection
{
    {"quartz.serializer.type","json" },
    {"quartz.jobStore.type","Quartz.Impl.AdoJobStore.JobStoreTX, Quartz" },
    {"quartz.jobStore.dataSource","default" },
    {"quartz.dataSource.default.provider","MySql" },
    {"quartz.dataSource.default.connectionString","connectionstring"},
    {"quartz.jobStore.driverDelegateType","Quartz.Impl.AdoJobStore.MySQLDelegate, Quartz" },
    {"quartz.jobStore.misfireThreshold","3600000" }
};

The problem now is that when I hit the function/api, only the first and last job gets inserted into the database. Strangely, the last job repeats itself multiple times as well.

I tried changing the Job Identity name to something different but I then get foreign key errors as my data is being inserted into the database.

Example sequence should be:

300,300,300,...,102

However, the sequence ends up being:

300,102,102,102

EDIT:

When I set the threads to 1 and changed the Job Identity to be dynamic, it works. However, does this defeat the purpose of DisallowConcurrentExecution?

Solution

I am reproduced your problem and found the way how you should rewrite your code to get expected behaviour as I understand it

Make job identity unique

First of all, I see you use same identity for every job you executing, duplicating causes because you have the same identity and 'replace' flag as 'true' in AddJob method call.

You are on the right way when you decide to use dynamic identity generation for each job, it could be new guid or some incremental int count for each identity. Something like this:

// 'i' variable is a job counter (0, 1, 2 ...)
.WithIdentity($"BatchNumberDataJob-{i}", $"GP_BatchNumberDataJob")
// or
.WithIdentity(Guid.NewGuid().ToString(), $"GP_BatchNumberDataJob")

// Also maybe you want to set 'replace' flag to 'false'
// to enable 'already exists' error if collision occurs. 
// You may want handle such cases
await scheduler.AddJob(job, false);

After that you can remove [DisallowConcurrentExecution] attribute from the job, because it is based on a job key, it is not used anymore with such dynamic identity.

Concurrency

Basically, you have a few options how to execute your jobs, it really depends on what you trying to achieve.

Parallel execution

Fastest method to execute your code. Each job is completely separated from each others.

To do so you should prepare your database for such case (because as you said you have foreign key errors when you trying to achieve that behaviour).

It is hard exactly to say what you should change in database to support this behaviour because you say nothing about your database.

If your jobs needs to have an execution order - this method is not for you.

Ordered execution

The other way is to use ordered execution. If (for some reasons) you are not able to prepare your database to handle parallel job execution - you could use this method. This method is a way slower than parallel, but order which jobs are executing is determined.

You can achieve this behaviour two ways:

use jobs chaining. See this question.
set up max concurrency for scheduler:

var quartzProperties = new NameValueCollection
{
    {"quartz.threadPool.maxConcurrency","1" },
};

So jobs will be executed in the way you triggering them in the right order completely without parallelism.

Summary

It is really depends of what you trying to achieve. If your point is a speed - then you should rework your database and your job to support completely separated job execution no matter which order it executing. If your point is an ordering - you should use non-parallel methods for job execution. It is up to you.