I am trying to process collection of heavy weight elements (images). Size of collection varies between 8000 - 50000 entries. But for some reason after processing 1800-1900 entries my program falls with java.lang.OutOfMemoryError: Java heap space.
In my understanding each time when I call session.getTransaction().commit() program should free heap memory, but looks like it never happens. What do I do wrong? Here is the code:
private static void loadImages( LoadStrategy loadStrategy ) throws IOException {
log.info( "Loading images for: " + loadStrategy.getPageType() );
Session session = sessionFactory.openSession();
session.setFlushMode( FlushMode.COMMIT );
Query query = session.createQuery( "from PageRaw where pageType = :pageType and pageStatus = :pageStatus and sessionId = 1" );
query.setString( "pageStatus", PageStatus.SUCCESS.name() );
query.setString( "pageType", loadStrategy.getPageType().name() );
query.setMaxResults( 50 );
List<PageRaw> pages;
int resultNum = 0;
do {
session.getTransaction().begin();
log.info( "Get pages statring form " + resultNum + " position" );
query.setFirstResult( resultNum );
resultNum += 50;
pages = query.list();
log.info( "Found " + pages.size() + " pages" );
for (PageRaw pr : pages ) {
Set<String> imageUrls = new HashSet<>();
for ( UrlLocator imageUrlLocator : loadStrategy.getImageUrlLocators() ) {
imageUrls.addAll(
imageUrlLocator.locateUrls( StringConvector.toString( pr.getSourceHtml() ) )
);
}
removeDeletedImageRaws( pr.getImages(), imageUrls );
loadNewImageRaws( pr.getImages(), imageUrls );
}
session.getTransaction().commit();
} while ( pages.size() > 0 );
session.close();
}
It's important to distinguish these two actions:
flushing a session executes all pending statements against the database (it synchronizes the in-memory state with the database state);
clearing a session purges the session (1st-level) cache, thus freeing memory.
So you need to both flush and clear a session in order to recover the occupied memory.
In addition to that, you must disable the 2nd-level cache. Otherwise all (or most of) the objects will remain reachable even after clearing the session.