javaspring-batchscheduled-tasksquartz-schedulercrontrigger

Quartz job trigger miss at scheduled time intermittently in prod environment


I have gone through many questions asked over stackoverflow and many other sites but still didn't find any luck to resolve my issue.

We have scheduled around 35 jobs between 9:30-10 AM but sometime 3 to 5 jobs missed execution and after running missing jobs as Adhoc run system again starts working correctly from next day. This happen again after some days or weeks.

We are using quartz version 2.2.3 and spring batch version 4.2.0.RELEASE.

We have not overridden scheduler thread count because it's working perfectly till long time and suddenly start failing for some jobs intermittently.

Below are quartz properties,

<property name="quartzProperties">
    <props>
        <prop key="org.quartz.scheduler.skipUpdateCheck">true</prop>
        <prop key="org.quartz.jobStore.class">org.quartz.impl.jdbcjobstore.JobStoreTX</prop>
        <prop key="org.quartz.jobStore.driverDelegateClass">org.quartz.impl.jdbcjobstore.StdJDBCDelegate</prop>
        <prop key="org.quartz.scheduler.instanceId">AUTO</prop>
        <prop key="org.quartz.jobStore.useProperties">false</prop>
        <prop key="org.quartz.jobStore.tablePrefix">#{'${db.defaultschema}' != '' ? '${db.defaultschema}'+'.QRTZ_' : 'QRTZ_'}</prop>
        <prop key="org.quartz.jobStore.selectWithLockSQL">SELECT * FROM {0}LOCKS UPDLOCK WHERE LOCK_NAME = ?</prop>
        <prop key="org.quartz.jobStore.isClustered">true</prop>
        <prop key="org.quartz.jobStore.dataSource">dataSource</prop>
        <prop key="org.quartz.jobStore.driverDelegateClass">org.quartz.impl.jdbcjobstore.oracle.OracleDelegate
        </prop>
    </props>
</property>

Spring batch job config:

<batch:job id="reportJob">
    <batch:step id="step1">
        <batch:tasklet>
            <batch:chunk reader="reports-reader" processor="reports-processor"
                writer="reports-writer" commit-interval="0">
            </batch:chunk>
        </batch:tasklet>
    </batch:step>
    <batch:listeners>
        <batch:listener ref="batchJobListener" />
    </batch:listeners>
</batch:job>
<bean id="reports-reader" scope="step"
    class="com.company.reportloader.reader.ReportsItemReader">
    <property name="reportsItemReaderService" ref="reportsItemReaderService"></property>
</bean>

<bean id="reports-processor" class="com.company.reportloader.processor.ReportsItemProcessor"></bean>
<bean id="reports-writer" class="com.company.reportloader.writer.ReportsItemWriter">
</bean>

Overriding executeInternal of QuartzJobBean and creating jobParameters to invoke spring batch job as below,

@Override
protected void executeInternal(JobExecutionContext context) throws JobExecutionException {
  launcher.run(job, jobParameters);
}

Any help or pointer will be great help.


Solution

  • We had a code issue where we updated misfire instruction to 2 in job edit functionality. We resolved the issue by setting misfire_instr in qrtz_triggers table to 0. Somehow scheduler is considering few jobs as mis-fired due to which, jobs didn't trigger at scheduled time. For cron-trigger below are definition of misfire instruction,

    **smart policy - default**  See: withMisfireHandlingInstructionFireAndProceed
    
    **withMisfireHandlingInstructionIgnoreMisfires**
    MISFIRE_INSTRUCTION_IGNORE_MISFIRE_POLICYQTZ-283    All misfired executions are 
    immediately executed, then the trigger runs back on schedule.
    Example scenario: the executions scheduled at 9 and 10 AM are executed immediately. 
    The next scheduled execution (at 11 AM) runs on time.
    
    **withMisfireHandlingInstructionFireAndProceed**
    MISFIRE_INSTRUCTION_FIRE_ONCE_NOW   Immediately executes first misfired execution and 
    discards other (i.e. all misfired executions are merged together). Then back to 
    schedule. No matter how many trigger executions were missed, only single immediate 
    execution is performed.
    Example scenario: the executions scheduled at 9 and 10 AM are merged and executed only 
    once (in other words: the execution scheduled at 10 AM is discarded). The next 
    scheduled execution (at 11 AM) runs on time.
    
    **withMisfireHandlingInstructionDoNothing**
    MISFIRE_INSTRUCTION_DO_NOTHING    All misfired executions are discarded, the scheduler 
    simply waits for next scheduled time.
    Example scenario: the executions scheduled at 9 and 10 AM are discarded, so basically 
    nothing happens. The next scheduled execution (at 11 AM) runs on time.
    

    After updating misfire_instr to 0, due to smart policy (default), quartz kick-off the mis-fired jobs within 3-5 mins once load reduces.