javaperformancehibernate3

high performance hibernate insert


I am working on a latency sensitive part of an application, basically i will receive a network event transform the data and then insert all the data into the DB. After profiling i see that basically all my time is spent trying to save the data. here is the code

private void insertAllData(Collection<Data> dataItems)
{
    long start_time = System.currentTimeMillis();
    long save_time = 0;
    long commit_time = 0;
    Transaction tx = null;
    try
    {
        Session s = HibernateSessionFactory.getSession();
        s.setCacheMode(CacheMode.IGNORE);
        s.setFlushMode(FlushMode.NEVER);
        tx = s.beginTransaction();
        for(Data data : dataItems)
        {
            s.saveOrUpdate(data);
        }
        save_time = System.currentTimeMillis();
        tx.commit();
        s.flush();
        s.clear();
    }
    catch(HibernateException ex)
    {
        if(tx != null)
            tx.rollback();
    }
    commit_time = System.currentTimeMillis();
    System.out.println("Save: " + (save_time - start_time));
    System.out.println("Commit: " + (commit_time - save_time));
    System.out.println();
}

The size of the collection is always less than 20. here is the timing data that i see:

Save: 27
Commit: 9

Save: 27
Commit: 9

Save: 26
Commit: 9

Save: 36
Commit: 9

Save: 44
Commit: 0

This is confusing to me. I figure that the save should be quick and all the time should be spent on commit. but clearly I'm wrong. I have also tried removing the transaction (its not really necessary) but i saw worse times... I have set hibernate.jdbc.batch_size=20...

I can expect to get as many as 500 messages/sec so i need single message handling to be under 20 milliseconds.

i need this operation to be as fast as possible, ideally there would only be one roundtrip to the database. How can i do this?


Solution

  • Move your primary key generation away from a server side auto-increment. Your Java code must be responsible for the PK generation to avoid round trips.

    For decent bulk insert performance, you need a method that won't need to hit the database on every single call to saveOrUpdate. Using UUIDs as the primary key, or implementing HiLo can help achieve this. Otherwise, there's no bulk insert actually going on.

    To have both performance and interoperability with other external systems, the pooled or the pooled-lo optimizers are the best choice.