javajdodatanucleus

Reducing memory usage for batch JDO inserts


We have a Java web app using Data Nucleus 1.1.4 / JDO 2.3 for persistence.

There's a batch import operation that persists a large number of JDO objects in one shot. We've had some situations where OutOfMemoryError's are being thrown because the data to import is so large.

The intended pattern was to loop through the input stream, parse a row, instantiate a JDO object, call makePersistent, then release the object reference to the JDO object in order to keep our memory footprint flat regardless of the input data size.

In doing some analysis of the heap during this operation, it appears that the JDO object instances pile up and take up a large chunk of memory until the commit happens. Even though we don't hold references to them it looks like Data Nucleus's PersistenceManager and Transaction implementations reference an org.datanucleus.ObjectManagerImpl object that holds onto a list of "dirty" JDO object instances (actually copies of the original). There's probably a good rationale reason for this but I was a bit surprised that the framework needed to hold onto copies of each JDO object. They are let go after the commit, but given that we want to make sure all insertions happen atomically, we need to run this operation inside of a transaction. In it's current state the memory usage is linearly correlated to the data input size which opens us up for these OutOfMemoryErrors - if not for a single operation, then under concurrent operations.

Are there any tips or best practices around keeping as close to a flat memory footprint for a batch JDO insertion operation like this?


Solution

  • What I found out was that the best practice was to call the PersistenceManager's flush method periodically through the loop. This causes the JDO framework (ObjectManagerImpl) to let go of the objects.