javaspringspring-batch

Spring Batch write multiple chunks at once/ aggregate writing of chunks after reaching threshold


I am using Spring Batch for the first time.

Currently, my setup is as follows:

Bascially, 25 items are read from the Kafka Topic for 1 chunk. Now, 1 Kafka item contains 1...n child items. This child item size is not fixed, it differs for every read Kafka item. Every Kafka item is flattened in my processor, so that the processor returns a list of the child items. The writer then gets the list of child items of the full chunk. So far so good.

What this means is, that every time my writer does a write to the database, the count of items that are actually written can be different. One time it might write 150 items, the next chunk it might be only 50 items and so on.

Is there a way to create a writer that only writes to the database when a specific child item threshold is reached? For example, I want the writer to write the child items if a minimal count of 1000 items is reached. That way I want to improve write performance when writing to my Postgres database. Bascially, I want to aggregate the write operation for multiple chunks while still maintaining the commit behaviour of Spring Batch.

I had the idea of writing a custom writer that is using some internal buffer. But I can't get my head around the commit/transaction behaviour of Spring Batch. How shall I tell Spring Batch that it only commits the written items when the writer actually writes data after reaching the threshold? Maybe it's just not possible.


Solution

  • That's not possible. The chunk size is the number of items to read/process/write in a single transaction. If the processor produces more items (like a flat-map operation), there is no way to limit the number of items at the writer level.