I wanted to know what effect the batchsize
option has on an insert operation using spark jdbc. Does this mean a bulk insert using one insert command similar to a bulk insert or a batch of insert commands that gets committed at the end?
Could someone clarify as this is not clearly mentioned in the documentation?
According to the source code, the option batchsize
is used for executeBatch
method of PreparedStatement
, which is able to submit a batch of commands to the database for execution.
The key code:
val stmt = conn.prepareStatement(insertStmt)
while (iterator.hasNext) {
stmt.addBatch()
rowCount += 1
if (rowCount % batchSize == 0) {
stmt.executeBatch()
rowCount = 0
}
}
if (rowCount > 0) {
stmt.executeBatch()
}
Back to your question, it is true that there are
a batch of insert commands
But, the statement gets committed at the end
is wrong, because it is fine for only part of those inserts to execute successfully. No extra transaction requirements here. BTW, Spark will adopt the default isolation level if it is not specified.