I'm using Spring Batch XML based configurations in my project. I've implemented logic by taking reference from: Spring Batch With Annotation and Caching.
Currently we're reading FlatFile
which has Customer Booking information (FlatFile
has roughly 14 fields), volume of data in cache is 1 million and have used org.springframework.cache.concurrent.ConcurrentMapCacheManager
implementation.
Now I've Request data having 15 millions records into FlatFile
(Having 60 fields). Now into the processor, we're looking up the data into cache and creating final object out of it.
private Optional<SomeBooking> performLookup(List<SomeBooking> existingCacheData, SomeItem item) {
return existingCacheData.stream()
.filter(booking -> item.getId().equals(booking.getId()) && item.getSomeNum().equals(booking.getSomeNum()))
.findAny();
}
Another code snippet
List<FinalBooking> existingCacheData = (List<SomeBooking>) cacheManager.getCache("reference-data").get("data").get();
Optional<SomeBooking> opAbBooking = performLookup(existingCacheData, item);
if(opBooking.isPresent()) {
AFinalBooking autoborrowBooking = opBooking.get();
..........
.........
}
The lookup from the cache is taking almost 10ms to 30 ms for per item and to process 15 millions records its taking around 10 hrs which seems not good figures. The final output records now has 74 fields/record.
Time taken: 145 milliseconds
Time taken: 143 milliseconds
Time taken: 89 milliseconds
Time taken: 133 milliseconds
Time taken: 141 milliseconds
Time taken: 58 milliseconds
Time taken: 67 milliseconds
Time taken: 134 milliseconds
Time taken: 131 milliseconds
Time taken: 142 milliseconds
Time taken: 117 milliseconds
Time taken: 140 milliseconds
Time taken: 84 milliseconds
Time taken: 86 milliseconds
Time taken: 133 milliseconds
Time taken: 107 milliseconds
Time taken: 86 milliseconds
Time taken: 38 milliseconds
Time taken: 75 milliseconds
Time taken: 125 milliseconds
Time taken: 76 milliseconds
Time taken: 84 milliseconds
Time taken: 132 milliseconds
Time taken: 68 milliseconds
Time taken: 135 milliseconds
Time taken: 97 milliseconds
How can we improve the performance? Here we're reading the FlatFile and creating Multiple output files (using org.springframework.batch.item.support.ClassifierCompositeItemWriter
) when some condition matches. We're also making the use of org.springframework.batch.item.file.MultiResourceItemWriter
to create multiple versions of FlatFile
when itemCountLimitPerResource
is reached. So we're already using complex beans in our project.
It is not clear from your post how big the list 'existingCacheData' is, but it seems non-intuitive to cache a list and then iterate it every time you're looking for something. One would normally cache a data structure like a Map and define a key based on getId() and getSomeNum() for this use case if the list is sufficiently large that the amortized constant time performance of the Map is better than the average iterative search. Alternately you could also cache the individual objects and retrieve them directly from the cache with a similarly derived key.