I have a Spring Batch application, where I transfer data from one database to another (the target DB is PostgreSQL, with enough space).
I'm executing the following query:
SELECT s.name, s.brand
FROM shop s
ORDER BY s.name, s.brand;
I'm using JdbcCursorItemReader.
The size of my source data is larger than the Java int range.
When I check the results in the target database, it turns out that there are only records up to the int limit.
I don't see any error messages in the logs.
My questions:
Does Spring Batch internally impose any int-based limit?
Does anyone know whether JdbcPagingItemReader is able to handle a fixed dataset where the number of rows exceeds the int range?
@Configuration
public class DataToGroupInitialFlowConfig {
public final String INIT_READ_DATATOGROUP_TO_DATE_SQL = "select dg.groupid, dg.datagroupid from WI_DATATOGROUP_BKP dg order by dg.groupid, dg.datagroupid";
public final String WRITE_DATATOGROUP_SQL = "INSERT INTO \"DATATOGROUP\" (\"groupid\", \"datagroupid\") VALUES (:groupid, :datagroupid)";
@Autowired
@Qualifier("dataSource")
private DataSource postgresqlDatasource;
@Autowired
@Qualifier("oracleDataSource")
private DataSource oracleDataSource;
@Autowired
private JobRepository jobRepository;
@Autowired
private PlatformTransactionManager transactionManager;
@Bean
public Flow getDataToGroupInitialFlow() {
log.info("Is new group flow");
return new FlowBuilder<SimpleFlow>("getDataInitialToGroup")
.start(getDataToGroupInitStep())
.on("COMPLETED").end()
.on("FAILED").fail()
.build();
}
@Bean
public Step getDataToGroupInitStep() {
return new StepBuilder("getDataToGroupInitStep", jobRepository)
.<DataToGroup, DataToGroup>chunk(300_000, transactionManager)
.listener(new ItemWriteListener<>() {
@Override
public void onWriteError(Exception ex, Chunk<? extends DataToGroup> items) {
log.error("Error during writing data to group - chunk level: {} ",
items.getItems().stream().map(DataToGroup::toString)
.collect(Collectors.joining("|")));
log.error("Error during writing data to group - chunk level - exception: ", ex);
ItemWriteListener.super.onWriteError(ex, items);
}
})
.listener(new ItemReadListener<>() {
@Override
public void onReadError(Exception ex) {
log.error("Error during reading data to group - chunk level - exception: ", ex);
ItemReadListener.super.onReadError(ex);
}
})
.listener(new RetryListener() {
@Override
public <T, E extends Throwable> void close(RetryContext context, RetryCallback<T, E> callback, Throwable throwable) {
log.error("Occurred error during last retry: ", throwable);
RetryListener.super.close(context, callback, throwable);
}
@Override
public <T, E extends Throwable> void onError(RetryContext context, RetryCallback<T, E> callback, Throwable throwable) {
log.error("Occurred error during retry: ", throwable);
RetryListener.super.onError(context, callback, throwable);
}
})
.reader(getDataToGroupInitialReader())
.writer(getDataToGroupInitialWriter())
.build();
}
@Bean
@StepScope
public JdbcCursorItemReader<DataToGroup> getDataToGroupInitialReader() {
JdbcCursorItemReader<DataToGroup> reader = new LoggingJdbcCursorItemReader<>();
reader.setSql(INIT_READ_DATATOGROUP_TO_DATE_SQL);
reader.setDataSource(oracleDataSource);
reader.setRowMapper((ResultSet rs, int rowNum) -> DataToGroup.builder()
.groupid(rs.getString("groupid"))
.datagroupid(rs.getString("datagroupid"))
.build());
return reader;
}
@Bean
public JdbcBatchItemWriter<DataToGroup> getDataToGroupInitialWriter() {
JdbcBatchItemWriter<DataToGroup> writer = new JdbcBatchItemWriter<>();
writer.setDataSource(postgresqlDatasource);
writer.setSql(WRITE_DATATOGROUP_SQL);
writer.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>());
return writer;
}
The issue might be anywhere, and even if you can find the exact cause of this particular error message in your particular stack, it won't help. There are probably multiple places where this will go wrong.
Javas arrays can only handle INT_MAX elements. All Collections built on top of a single array (which is all the usual ones) will inherit this limitation. Since all of the java ecosystem uses standard collections without caring much about this limitation, the only safe way to process more than INT_MAX elements in Java is paging and chunking at a high level.
So you would have to read your data using offset & limit and put them into the target database one after another.