jdbcbatch-filelarge-data-volumesout-of-memory

JDBC Batch Insert OutOfMemoryError


I have written a method insert() in which I am trying to use JDBC Batch for inserting half a million records into a MySQL database:

public void insert(int nameListId, String[] names) {
    String sql = "INSERT INTO name_list_subscribers (name_list_id, name, date_added)" + 
        " VALUES (?, ?, NOW())";
    Connection conn = null;
    PreparedStatement ps = null;

    try {
        conn = getConnection();
        ps = conn.prepareStatement(sql);

        for (String s : names ) {
            ps.setInt(1, nameListId); 
            ps.setString(2, s);
            ps.addBatch();
        }

        ps.executeBatch();

    } catch (SQLException e) {
        throw new RuntimeException(e);
    } finally {
        closeDbResources(ps, null, conn);
    }
}

But whenever I try to run this method, I get the following error:

java.lang.OutOfMemoryError: Java heap space
    com.mysql.jdbc.ServerPreparedStatement$BatchedBindValues.<init>(ServerPreparedStatement.java:72)
    com.mysql.jdbc.ServerPreparedStatement.addBatch(ServerPreparedStatement.java:330)
    org.apache.commons.dbcp.DelegatingPreparedStatement.addBatch(DelegatingPreparedStatement.java:171)

If I replace ps.addBatch() with ps.executeUpdate() and remove ps.executeBatch(), it works fine, though it takes some time. Please let me know if you know if using Batch is appropriate in this situation, and if it is, then why does it give OurOfMemoryError?

Thanks


Solution

  • addBatch and executeBatch give you the mechanism to perform batch inserts, but you still need to do the batching algorithm yourself.

    If you simply pile every statement into the same batch, as you are doing, then you'll run out of memory. You need to execute/clear the batch every n records. The value of n is up to you, JDBC can't make that decision for you. The larger the batch size, the faster things will go, but too large and you'll get memory starvation and things will slow down or fail. It depends how much memory you have.

    Start off with a batch size of 1000, for example, and experiment with different values from there.

    final int batchSize = 1000;
    int count = 0;
    for(String s : names ) {
       ps.setInt(1, nameListId); 
       ps.setString(2, s);
       ps.addBatch();
    
       if (++count % batchSize == 0) {
          ps.executeBatch();
          ps.clearBatch(); //not sure if this is necessary
       }
    }
    ps.executeBatch();   // flush the last few records.