I have written a Single-threaded Spring Batch Job
to fix data in a table column.
i.e. it's a simple job with no parallel or multi-threaded steps. It simply uses a JdbcCursorItemReader
to "stream" the data, and fix and update the row with the correct data through a writer. The main job step looks something like this :
@Bean("orderidFixerStep")
@JobScope
public Step orderidFixerStep(final JdbcCursorItemReader<Order> orderIdReader,
final OrderIdItemWriter orderIdItemWriter,
final JobRepository jobRepository,
final PlatformTransactionManager transactionManager,
final AppProperties appProperties) {
return new StepBuilder("orderidFixerStep", jobRepository)
.<Order, Order>chunk(appProperties.getBatchSizeToValidate(), transactionManager)
.reader(orderIdReader)
.writer(orderIdItemWriter)
.chunk(appProperties.getBatchSizeToValidate())
.build();
}
What I am worried about is that I am reading from the same table and then writing back to the same table.
Will this lead to any inconsistencies by any chance?
I don't think so because since I am using a JdbcCursorItemReader
which returns a Resultset
(TYPE_FORWARD_ONLY
is the default?) for all of the data and the data is fetched using a Cursor
hence it would not change in the Resultset
even if I am updating the rows.
Also, I am updating the rows that I have already read for a chunk or a batch, so even if the resultset changes it should not be a problem. The same goes for any new rows that will be added. These rows will not be processed (and don't need to be processed as per requirements) since the ResultSet
will not have them.
Is my understanding correct?
Is there any corner case or something that I am missing in my understanding or that I may have overlooked?
In my testing it seems to be working fine however I just wanted to check if there is anything that could go wrong.
More details about the data:
I have a column in which I store a hash value for every order id i.e. 1 hash for 1 order id.
The order ID column is in a different table.
The order ID values in some of the rows due to some reason have got corrupted so I need to loop over all the order IDs calculate the hash using some algo and check if it matches with the corresponding hash value in the other table.
If it doesn't match then I need to get the order ID from a 3rd column by parsing a JSON stored in it which has the original order ID.
The hash or the JSON does not change, and neither does the orderId.
All of this data is read-only.
All of these are in the same database.
By default, the cursor will be opened in its own connection and will not participate in any transactions started for the rest of the step processing. You need to set useSharedExtendedConnection
on the JdbcCursorItemReader
and wrap the datasource in a ExtendedConnectionDataSourceProxy
. There is a note about this in the javadoc here.