javaspringspring-bootspring-batch

Stop all spring batch jobs at shutdown (CTRL-C)


I have a spring boot / spring batch application, which starts different jobs.

When the app is stopped (CTRL-C) the jobs are left in the running state (STARTED).
Even though CTRL-C gives the app enough time to gracefully stop the jobs the result is the same as a kill -9.

I've found a way (see below) to gracefully stop all jobs when the application is killed using CTRL-C, but would like to know if there is a better / simpler way to achieve this goal.

Everything below is documentation on how I managed to stop the jobs.

In a blog entry from 부알프레도 a JobExecutionListener is used to register shutdown hooks which should stop jobs:

public class ProcessShutdownListener implements JobExecutionListener {
    private final JobOperator jobOperator;
    ProcessShutdownListener(JobOperator jobOperator) { this.jobOperator = jobOperator; }
     
    @Override public void afterJob(JobExecution jobExecution) { /* do nothing. */ }
 
    @Override
    public void beforeJob(final JobExecution jobExecution) {
        Runtime.getRuntime().addShutdownHook(new Thread() {
            @Override
            public void run() {
                super.run();
                try {
                    jobOperator.stop(jobExecution.getId());
                    while(jobExecution.isRunning()) {
                        try { Thread.sleep(100); } catch (InterruptedException e) {}
                    }
                } catch (NoSuchJobExecutionException | JobExecutionNotRunningException e) { /* ignore */ }
            }
        });
    }
}

In addition to the provided code I also had to create a JobRegistryBeanPostProcessor.
Without this PostProcessor the jobOperator would not be able to find the job.
(NoSuchJobException: No job configuration with the name [job1] was registered

    @Bean
    public JobRegistryBeanPostProcessor jobRegistryBeanPostProcessor(JobRegistry jobRegistry) {
        JobRegistryBeanPostProcessor postProcessor = new JobRegistryBeanPostProcessor();
        postProcessor.setJobRegistry(jobRegistry);
        return postProcessor;
    }

The shutdown hook was not able to write the state to the database, as the database connection was already closed: org.h2.jdbc.JdbcSQLNonTransientConnectionException: Database is already closed (to disable automatic closing at VM shutdown, add ";DB_CLOSE_ON_EXIT=FALSE" to the db URL)

Processing item 2 before
Shutdown Hook is running !
2021-02-08 22:39:48.950  INFO 12676 --- [extShutdownHook] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown initiated...
2021-02-08 22:39:49.218  INFO 12676 --- [extShutdownHook] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown completed.
Processing item 3 before
Exception in thread "Thread-3" org.springframework.transaction.CannotCreateTransactionException: Could not open JDBC Connection for transaction; nested exception is java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30004ms.

In order to make sure that spring boot doesn't close the hikari datasource pool before having stopped the jobs I used a SmartLifeCycle as mentioned here.

The final ProcessShutdownListener looks like:

@Component
public class ProcessShutdownListener implements JobExecutionListener, SmartLifecycle {
    private final JobOperator jobOperator;
    public ProcessShutdownListener(JobOperator jobOperator) { this.jobOperator = jobOperator; }

    @Override
    public void afterJob(JobExecution jobExecution) { /* do nothing. */ }

    private static final List<Runnable> runnables = new ArrayList<>();

    @Override
    public void beforeJob(final JobExecution jobExecution) {
        runnables.add(() -> {
                try {
                    if (!jobOperator.stop(jobExecution.getId())) return;
                    while (jobExecution.isRunning()) {
                        try {
                            Thread.sleep(100);
                        } catch (InterruptedException ignored) { /* ignore */ }
                    }
                } catch (NoSuchJobExecutionException | JobExecutionNotRunningException e) { /* ignore */ }
            });
    }

    @Override
    public void start() {}

    @Override
    public void stop() {
//        runnables.stream()
//                .parallel()
//                .forEach(Runnable::run);
        runnables.forEach(Runnable::run);
    }

    @Override
    public boolean isRunning() { return true; }

    @Override
    public boolean isAutoStartup() { return true; }

    @Override
    public void stop(Runnable callback) { stop(); callback.run(); }

    @Override
    public int getPhase() { return Integer.MAX_VALUE; }
}

This listener has to be registered when configuring a job:

    @Bean
    public Job job(JobBuilderFactory jobs,
                   ProcessShutdownListener processShutdownListener) {
        return jobs.get("job1")
                .listener(processShutdownListener)
                .start(step(null))
                .build();
    }

Finally as mentioned in the exception output the flag: ;DB_CLOSE_ON_EXIT=FALSE must be added to the jdbc url.


Solution

  • This approach is the way to go, because shutdown hooks are the only way (to my knowledge) offered by the JVM to intercept external signals. However, this approach is not guaranteed to work because shutdown hooks are not guaranteed to be called by the JVM. Here is an excerpt from the Javadoc of Runtime.addShutdownHook method:

    In rare circumstances the virtual machine may abort, that is, stop running
    without shutting down cleanly. This occurs when the virtual machine is 
    terminated externally, for example with the SIGKILL signal on Unix or 
    the TerminateProcess call on Microsoft Windows.
    

    Moreover, shutdown hooks are expected to run "quickly":

    Shutdown hooks should also finish their work quickly. When a program invokes
    exit the expectation is that the virtual machine will promptly shut down
    and exit.
    

    In your case, JobOperator.stop involves a database transaction (which might cross a network) to update the job's status, and I'm not sure if this operation is "quick" enough.

    As a side note, there is an example in the samples module called GracefulShutdownFunctionalTests. This example is based on JobExecution.stop which is deprecated, but it will be updated to use JobOperator.stop.