javaspringspring-batchspring-cloud-dataflowspring-cloud-task

Spring Cloud Data Flow : Value too long for column "TASK_NAME" (DeployerPartitionHandler with Spring Batch)


I have a simple Spring Batch Job running Kubernetes as a Spring Cloud Task. This job uses Spring Batch Partitioning to further launch partitioned steps as task pods on the same Kubernetes cluster.

Main job : (Relevant parts)

@Bean
public Job mainControllerJob() {
    LOGGER.info("Creating mainControllerJobbean...");
    return jobBuilderFactory.get("mainControllerJob").incrementer(new RunIdIncrementer())
                .start(testStep(null, null, null)).build();
}

@Bean
public Step testStep(StepBuilderFactory stepBuilderFactory,
        @Qualifier("testPartitioner") Partitioner partitioner, PartitionHandler partitionHandler) {
    LOGGER.info("Creating testStep");
    return stepBuilderFactory.get("testStep")
            .partitioner("testWorkerStep", partitioner).partitionHandler(partitionHandler).build();
}

@Bean
public DeployerPartitionHandler partitionHandler(@Value("${test.partion.app}") String resourceLocation,
        @Value("${test.application.name}") String applicationName, ApplicationContext context,
        TaskLauncher taskLauncher, JobExplorer jobExplorer, DockerResourceLoader dockerResourceLoader) {
    Resource resource = dockerResourceLoader.getResource(resourceLocation);
    DeployerPartitionHandler partitionHandler = new DeployerPartitionHandler(taskLauncher, jobExplorer, resource,
            "testWorkerStep", taskRepository);

    List<String> commandLineArgs = new ArrayList<>();
    commandLineArgs.add("--spring.cloud.task.initialize.enable=false");
    commandLineArgs.add("--spring.batch.initializer.enabled=false");

    commandLineArgs.addAll(Arrays.stream(applicationArguments.getSourceArgs()).filter(
            x -> !x.startsWith("--spring.profiles.active=") && !x.startsWith("--spring.cloud.task.executionid="))
            .collect(Collectors.toList()));
    commandLineArgs.addAll(applicationArguments.getNonOptionArgs());

    partitionHandler.setCommandLineArgsProvider(new PassThroughCommandLineArgsProvider(commandLineArgs));
    partitionHandler.setEnvironmentVariablesProvider(new NoOpEnvironmentVariablesProvider());
    partitionHandler.setMaxWorkers(maxWorkers);
    partitionHandler.setGridSize(gridSize);
    partitionHandler.setApplicationName(applicationName);

    return partitionHandler;
}

When the child step (task pod) is launched on k8 by the above job, I see the following exception for the child step :

 org.springframework.context.ApplicationContextException: Failed to start bean 'taskLifecycleListener'; nested exception is org.springframework.dao.DataIntegrityViolationException: PreparedStatementCallback; SQL [UPDATE TASK_EXECUTION set START_TIME = ?, TASK_NAME = ?, LAST_UPDATED = ?, PARENT_EXECUTION_ID = ? where TASK_EXECUTION_ID = ?]; 
Value too long for column """TASK_NAME"" VARCHAR(100)": "'test-app,test-app_testJob_testWorkerStep:partition5800' (106)"; 
SQL statement:
    UPDATE TASK_EXECUTION set START_TIME = ?, TASK_NAME = ?, LAST_UPDATED = ?, PARENT_EXECUTION_ID = ? where TASK_EXECUTION_ID = ? [22001-199]; nested exception is org.h2.jdbc.JdbcSQLDataException: Value too long for column """TASK_NAME"" VARCHAR(100)"

The error is clear. I need to somehow shorten the name of the child partion step so that it fits within th ecolumn width of the the internal TASK_EXECUTION table.

What I want to understand is how can I change the name of the child partition step launched by a Spring Batch Job myself?

I roughly understand that the step name is being created by SimpleStepExecutionSplitter. However, I don't see a way to override this behavior programatically. How can I change the child step name and also ensure that I do that in a way that it does not impact the restartability of my jobs/steps.


Solution

  • The worker step name is the concatenation of the step name and the partition name (with a separator in between). Partition names can be customized using a PartitionNameProvider. So if you want to have control on how partition names are generated, you need to make your Partitioner implement the PartitionNameProvider interface.

    Another option is to increase the size of the TASK_NAME column.