I have to improve the performance of a very slow code and I am pretty new to Hibernate. I have studied carefully the code and concluded that the issue is that it has a large set of entities to load and update
/insert
. To translate the algorithm to a more digestible example, let's say we have an algorithm like this:
for each competitionToSave in competitionsToSave
competition <- load a Competition by competitionToSave from database
winner <- load Person by competitionToSave.personID
do some preprocessing
if (newCompetition) then
insert competition
else
update competition
end if
end for
This algorithm is of course problematic when there are lots of competition
s in competitionToSave
. So, my plan is to select all competition
s and winner
s involved with two database requests the most, preprocess data, which will quicken the read, but more importantly, to make sure I will save via insert
/update
batches of 100 competition
s instead of saving them separately. Since I am pretty new to Hibernate, I consulted the documentation and found the following example:
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
However, I am not sure I understand it correctly. About the method .save() I read:
Persist the given transient instance, first assigning a generated identifier. (Or using the current value of the identifier property if the assigned generator is used.) This operation cascades to associated instances if the association is mapped with cascade="save-update".
But it is unclear to me whether a request to the database is send upon every save
. Am I accurate if I assume that in the example taken from the documentation session.save(customer)
saves the modification of the object in the Session
without sending a request to the database and then on every 20th item the session.flush()
sends the request to the database and session.clear()
removes the cache of the Session
?
You are correct in your assumptions, though the inserts will be triggered one-by-one:
insert into Customer(id , name) values (1, 'na1');
insert into Customer(id , name) values (2, 'na2');
insert into Customer(id , name) values (3, 'na3');
You can try and take advantage of the bulk insert feature to increase the performance even more.
There is hibernate property which you can define as one of the properties of hibernate's SessionFactory
:
<property name="jdbc.batch_size">20</property>
With this batch setting you should have output like this after each flush:
insert into Customer(id , name) values (1, 'na1') , (2, 'na2') ,(3, 'na3')..
One insert instead of a twenty.