javaspring-boothibernatejpabatch-insert

Hibernate batch update is turned on but still executes eache query seperatly


I'm trying to add a batch update to my spring boot project. The batch seems activated, but when I check the hibernate logs, there are still multiple queries.

hibernate.jdbc.batch_size=5

hibernate stats

295647400 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
3693300 nanoseconds spent preparing 21 JDBC statements;
5752515000 nanoseconds spent executing 20 JDBC statements;
1275544900 nanoseconds spent executing 4 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
1287992700 nanoseconds spent executing 1 flushes (flushing a total of 19 entities and 0 collections);
735000 nanoseconds spent executing 2 partial-flushes (flushing a total of 1 entities and 1 collections)

hibernate log

2022-10-09 19:21:16,192 DEBUG org.hibernate.engine.jdbc.spi.SqlStatementLogger: insert into SYSTEM.SNAPSHOT (CREATED_ON, ENTITY_PROPER_NAME, ID) values (?, ?, ?)
2022-10-09 19:21:16,192 DEBUG org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl: Reusing batch statement
2022-10-09 19:21:16,192 DEBUG org.hibernate.engine.jdbc.spi.SqlStatementLogger: insert into SYSTEM.SNAPSHOT (CREATED_ON, ENTITY_PROPER_NAME, ID) values (?, ?, ?)
2022-10-09 19:21:16,192 DEBUG org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl: Reusing batch statement
2022-10-09 19:21:16,192 DEBUG org.hibernate.engine.jdbc.spi.SqlStatementLogger: insert into SYSTEM.SNAPSHOT (CREATED_ON, ENTITY_PROPER_NAME, ID) values (?, ?, ?)
2022-10-09 19:21:16,193 DEBUG org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl: Reusing batch statement
2022-10-09 19:21:16,193 DEBUG org.hibernate.engine.jdbc.spi.SqlStatementLogger: insert into SYSTEM.SNAPSHOT (CREATED_ON, ENTITY_PROPER_NAME, ID) values (?, ?, ?)
2022-10-09 19:21:16,193 DEBUG org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl: Reusing batch statement
2022-10-09 19:21:16,193 DEBUG org.hibernate.engine.jdbc.spi.SqlStatementLogger: insert into SYSTEM.SNAPSHOT (CREATED_ON, ENTITY_PROPER_NAME, ID) values (?, ?, ?)
2022-10-09 19:21:16,193 DEBUG org.jboss.logging.DelegatingBasicLogger: Executing batch size: 5

I'm expecting hibernate to generate a statement like this below for Oracle Database.

INSERT ALL 
    INTO Snapshot ( created_on, entity_proper_name, id ) VALUES ( ?, ?, ? )
    INTO Snapshot ( created_on, entity_proper_name, id ) VALUES ( ?, ?, ? )
    INTO Snapshot ( created_on, entity_proper_name, id ) VALUES ( ?, ?, ? )
SELECT 1 FROM dual;

When I'm checking the executed statement into Jprofiler or directly into Oracle, it's the same as shown in the hibernate logs. The execution count is also the same as it should be if you run inserts separately.

enter image description here

Does the Hibernate batch work for the Oracle database?


Code snippets

Spring Boot v2.7.1 
Spring v5.3.21
Java 17.0.3.1
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

application.yml

spring:
  datasource:
    url: jdbc:oracle:thin:@localhost:1521/db
    username: user
    password: password
    driver-class-name: oracle.jdbc.OracleDriver
  jpa:
    database-platform: org.hibernate.dialect.Oracle12cDialect
    hibernate:
      naming:
        physical-strategy: org.hibernate.boot.model.naming.PhysicalNamingStrategyStandardImpl
      ddl-auto: update
    show-sql: true
    properties:
      hibernate:
        dialect: org.hibernate.dialect.Oracle12cDialect
        format_sql: false
        jdbc:
          fetch_size: 100
          batch_size: 5
        order_updates: true
        order_inserts: true
        batch_versioned_data: true
        generate_statistics: true

Snapshot entity

@Entity
@Table(name = "SNAPSHOT", schema = "SYSTEM", catalog = "")
public class Snapshot {
    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    @Id
    @Column(name = "ID")
    private long id;
    @Basic
    @Column(name = "CREATED_ON")
    private String createdOn;
    ...
}

Snapshot service

@Transactional
    public void execute() {
    ...
    for (int i = 0; i < snapshots.size(); i++) {  
        snapshots.get(i).setFieldValue(fieldValue);
        snapshots.get(i).setCreatedOn(createdOn);
    }

    snapshotRepository.saveAll(snapshots);
       ...
}

pom.xml

<dependency>
    <groupId>com.oracle.database.jdbc</groupId>
    <artifactId>ojdbc11</artifactId>
    <version>21.7.0.0</version>
</dependency>

Solution

  • I wouldn't expect hibernate to use the special multiple-insert syntax, which is vendor specific. It rather uses so called (sometimes) array operations, which sends a parameterized query with an array of values (rows), which the oracle jdbc driver does support. Here's a random oracle article mentioning the stuff, but all major dbs have that possibility and here's some jdbc sample code using it (without hibernate).

    That's also sound with all your logs. Jdbc says it ran 4 batches for 20 inserts, with 5 rows per batch. Any speed up comes from reduced network round trips and only one lookup or parse operation per batched update in the db. Oracle's sql engine still does 1 insert per row with all associated pomp and circumstance, like constraint checking and index maintenance. The only way known to me to get around that is the oracle loader with direct path mode.

    But to verify that oracle actually uses batched statements is an issue, since its a data transfer feature. The sql engine will run separate statements again.

    For identical data one should see a significant speed increase of some 50% at least, although that may depend on the type of data inserted. This evaluation speaks of 500% compared to non-batched operations, but that's not much of a proof. You could provoke a database error for a defined number of rows on insert and then check the exception raised (should be some BatchUpdateException) and the actual number of rows processed. That might be enough of evidence that batching took place at the client side. Also a breakpoint in the jdbc code may do, OraclePreparedStatement.sendBatch() would be a candidate I guess. And perhaps you could debug the oracle executable. You'd have to use a connection that doesn't use the thin client, like you do (jdbc:oracle:thin...), but the OCI client (oracle call interface), the call you're searching for is OCIBindArrayOfStruct. For the db server side I have no idea honestly. Perhaps one could sniff out the IP traffic to determine the number of requests per n rows.

    As far as I'm concerned, I took the speed up for identical data as sufficient evidence, that batching took place. And even if it didn't, just the speed up was all I needed.