javajsr352jberet

How to distribute work correctly in JSR-352?


So I have been using Java Batch Processing for some time now. My jobs were either import/export jobs which chunked from a reader to a writer, or I would write Batchlets that would do some more complex processing. Since I am beginning to hit memory limits I need to rethink the architecture.

So I want to want to better leverage the chunked Reader/Processor/Writer pattern. And apparently I feel unsure how to distribute the work over the three items. During processing it becomes clear whether to write zero, one or several other records.

The reader is quite clear: It reads the data to be processed from the DB. But I am unsure how to write the records back to the database. I see these options:

Which way would be the best for this kind of task?


Solution

  • Looking at https://www.ibm.com/support/pages/system/files/inline-files/WP102706_WLB_JSR352.002.pdf, especially the chapters Chunk/The Processor and Chunk/The Writer it becomes obvious that it is up to me.

    The processor can return an object, and the writer will have to understand and write this object. So for the above case where the processor has zero, one or many items to write per input record, it should simply return a list. This list can contain zero, one or several elements. The writer has to understand the list and write its elements to the database.

    Since the logic is divided this way, the code is still pluggable and can easily be extended or maintained.

    Addon: Since both reader and writer this time connect to the same database, I perceived the problem that upon commit for each chunk the connection for the reader was also invalidated. The solution was to use a nonJTA datasource for the reader.