If you need to insert, say, 350,000 rows, doing 350,000 separate inserts will incur a lot of network overhead. Beter is to do a smaller number of batch inserts, e.g instead of:
insert into player(player_id, league_id, games) values (?,?,?)
it should be doing either:
insert into player(player_id, league_id, games) values (?,?,?),(?,?,?),(?,?,?),(?,?,?)...
or similar.
If we enable batching:
spring:
jpa:
properties:
hibernate:
jdbc:
batch_size: 10000
it has no effect, in that it still generates 1 insert per 1 record, instead of total_records/10,000 inserts in this case.
Any idea why?
Here is my Entity:
@IdClass(PlayerCompositeId.class)
public class Player implements Persistable {
@Id
Long leagueId;
@Id
Long playerId;
Integer numGames;
@Override
public boolean isNew() {
return true;
}
}
@NoArgsConstructor
@AllArgsConstructor
@Data
public class PlayerCompositeId implements Serializable {
private Long leagueId;
private Long playerId;
}
and the code is this:
List<Player> players = readPlayers(); // returns 25k players
players.saveAll()
What we see in the logs is 25k lines like this:
insert into players (league_id, player_id, num_games) values (?, ?, ?)
What I expected to see is 25,000/10,000 = 25 log lines like this:
insert into players (league_id, player_id, num_games) values (?, ?, ?),(?,?,?),(?,?,?)...
There is similar post Hibernate Batch Insert. Would it ever use one insert instead of multiple inserts? but about Oracle
Hibernate may log individual INSERT
statements, but it can still send them as a single batch under the hood.
So seeing multiple insert logs doesn’t always mean batching isn’t working.
To be sure, enable Hibernate statistics with:
spring.jpa.properties.hibernate.generate_statistics=true
.
With this config you should see how many batches were executed:
xxxx nanoseconds spent executing 35 JDBC batches;
There is a post how to enable additional logging - Hibernate - how to verify if batch insert is really performed
But I have not tested the additional logging approach