I'm trying to add a batch update to my spring boot project. The batch seems activated, but when I check the hibernate logs, there are still multiple queries.
hibernate.jdbc.batch_size=5
hibernate stats
295647400 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
3693300 nanoseconds spent preparing 21 JDBC statements;
5752515000 nanoseconds spent executing 20 JDBC statements;
1275544900 nanoseconds spent executing 4 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
1287992700 nanoseconds spent executing 1 flushes (flushing a total of 19 entities and 0 collections);
735000 nanoseconds spent executing 2 partial-flushes (flushing a total of 1 entities and 1 collections)
hibernate log
2022-10-09 19:21:16,192 DEBUG org.hibernate.engine.jdbc.spi.SqlStatementLogger: insert into SYSTEM.SNAPSHOT (CREATED_ON, ENTITY_PROPER_NAME, ID) values (?, ?, ?)
2022-10-09 19:21:16,192 DEBUG org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl: Reusing batch statement
2022-10-09 19:21:16,192 DEBUG org.hibernate.engine.jdbc.spi.SqlStatementLogger: insert into SYSTEM.SNAPSHOT (CREATED_ON, ENTITY_PROPER_NAME, ID) values (?, ?, ?)
2022-10-09 19:21:16,192 DEBUG org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl: Reusing batch statement
2022-10-09 19:21:16,192 DEBUG org.hibernate.engine.jdbc.spi.SqlStatementLogger: insert into SYSTEM.SNAPSHOT (CREATED_ON, ENTITY_PROPER_NAME, ID) values (?, ?, ?)
2022-10-09 19:21:16,193 DEBUG org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl: Reusing batch statement
2022-10-09 19:21:16,193 DEBUG org.hibernate.engine.jdbc.spi.SqlStatementLogger: insert into SYSTEM.SNAPSHOT (CREATED_ON, ENTITY_PROPER_NAME, ID) values (?, ?, ?)
2022-10-09 19:21:16,193 DEBUG org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl: Reusing batch statement
2022-10-09 19:21:16,193 DEBUG org.hibernate.engine.jdbc.spi.SqlStatementLogger: insert into SYSTEM.SNAPSHOT (CREATED_ON, ENTITY_PROPER_NAME, ID) values (?, ?, ?)
2022-10-09 19:21:16,193 DEBUG org.jboss.logging.DelegatingBasicLogger: Executing batch size: 5
I'm expecting hibernate to generate a statement like this below for Oracle Database.
INSERT ALL
INTO Snapshot ( created_on, entity_proper_name, id ) VALUES ( ?, ?, ? )
INTO Snapshot ( created_on, entity_proper_name, id ) VALUES ( ?, ?, ? )
INTO Snapshot ( created_on, entity_proper_name, id ) VALUES ( ?, ?, ? )
SELECT 1 FROM dual;
When I'm checking the executed statement into Jprofiler or directly into Oracle, it's the same as shown in the hibernate logs. The execution count is also the same as it should be if you run inserts separately.
Does the Hibernate batch work for the Oracle database?
Spring Boot v2.7.1
Spring v5.3.21
Java 17.0.3.1
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
application.yml
spring:
datasource:
url: jdbc:oracle:thin:@localhost:1521/db
username: user
password: password
driver-class-name: oracle.jdbc.OracleDriver
jpa:
database-platform: org.hibernate.dialect.Oracle12cDialect
hibernate:
naming:
physical-strategy: org.hibernate.boot.model.naming.PhysicalNamingStrategyStandardImpl
ddl-auto: update
show-sql: true
properties:
hibernate:
dialect: org.hibernate.dialect.Oracle12cDialect
format_sql: false
jdbc:
fetch_size: 100
batch_size: 5
order_updates: true
order_inserts: true
batch_versioned_data: true
generate_statistics: true
Snapshot entity
@Entity
@Table(name = "SNAPSHOT", schema = "SYSTEM", catalog = "")
public class Snapshot {
@GeneratedValue(strategy = GenerationType.SEQUENCE)
@Id
@Column(name = "ID")
private long id;
@Basic
@Column(name = "CREATED_ON")
private String createdOn;
...
}
Snapshot service
@Transactional
public void execute() {
...
for (int i = 0; i < snapshots.size(); i++) {
snapshots.get(i).setFieldValue(fieldValue);
snapshots.get(i).setCreatedOn(createdOn);
}
snapshotRepository.saveAll(snapshots);
...
}
pom.xml
<dependency>
<groupId>com.oracle.database.jdbc</groupId>
<artifactId>ojdbc11</artifactId>
<version>21.7.0.0</version>
</dependency>
I wouldn't expect hibernate to use the special multiple-insert syntax, which is vendor specific. It rather uses so called (sometimes) array operations, which sends a parameterized query with an array of values (rows), which the oracle jdbc driver does support. Here's a random oracle article mentioning the stuff, but all major dbs have that possibility and here's some jdbc sample code using it (without hibernate).
That's also sound with all your logs. Jdbc says it ran 4 batches for 20 inserts, with 5 rows per batch. Any speed up comes from reduced network round trips and only one lookup or parse operation per batched update in the db. Oracle's sql engine still does 1 insert per row with all associated pomp and circumstance, like constraint checking and index maintenance. The only way known to me to get around that is the oracle loader with direct path mode.
But to verify that oracle actually uses batched statements is an issue, since its a data transfer feature. The sql engine will run separate statements again.
For identical data one should see a significant speed increase of some 50% at least, although that may depend on the type of data inserted. This evaluation speaks of 500% compared to non-batched operations, but that's not much of a proof. You could provoke a database error for a defined number of rows on insert and then check the exception raised (should be some BatchUpdateException
) and the actual number of rows processed. That might be enough of evidence that batching took place at the client side. Also a breakpoint in the jdbc code may do, OraclePreparedStatement.sendBatch()
would be a candidate I guess. And perhaps you could debug the oracle executable. You'd have to use a connection that doesn't use the thin client, like you do (jdbc:oracle:thin...
), but the OCI client (oracle call interface), the call you're searching for is OCIBindArrayOfStruct
. For the db server side I have no idea honestly. Perhaps one could sniff out the IP traffic to determine the number of requests per n rows.
As far as I'm concerned, I took the speed up for identical data as sufficient evidence, that batching took place. And even if it didn't, just the speed up was all I needed.