I need to make a batch process in java that will read a file of a variable amount of records (though can safely assume 5k+), treat them and then insert the records into an oracle 11g database. I need to do it via call(s) to a stored procedure.
I have made a similar process already some months ago, which I am not too proud of (for no particular reason, other than I am sure it is not optimal at all). The previous solution I kept everything in memory and in order to insert the rows I created an INSERT ALL statement in which I just appended all the records via a stringbuilder, and just executed that one statement to insert 15k+ records in one go. This time I need to use a stored procedure for the insertion of the data. I have been reading and now know there are ways to send arrays of data to stored procedures, so I would be able to send multiple records at a time.
Should I make a Stored procedure that receives arrays and just send all - potentially thousands - of the records in one single call to that SP? Or should I limit it to a certain amount of records at a time and call that SP (records/limit) amount of times?
Or should I stay away from using arrays, and have a stored procedure that just receives the information for one record, and call it as many times as I have records?
If I were to do multiple calls, I was thinking of utilizing PreparedStatements and the .addBatch() and .executeBatch() methods, would this be the way to go?
I need to be able to insert all the records, and rollback in case of any error. For which I am going to use transactions. I am technically not required to meet any threshold in terms of performance, but I am intersted in the topic and this could be a good time to start worrying more about it, so I would like some pointers and tips from someone with experience in the topic.
"Should I make a Stored procedure that receives arrays and just send all - potentially thousands - of the records in one single call to that SP? Or should I limit it to a certain amount of records at a time and call that SP (records/limit) amount of times?"
Limit to a certain amount of records. I generally start with between 100 and 1000, depending on the total size of a record.
"Or should I stay away from using arrays, and have a stored procedure that just receives the information for one record, and call it as many times as I have records?"
No. You will waste CPU and above all time: every time Java calls the database there is time spent just on sending the message and getting the reply back (related to "latency").
"If I were to do multiple calls, I was thinking of utilizing PreparedStatements and the .addBatch() and .executeBatch() methods, would this be the way to go?"
Yes, but those methods are at their best with SQL statements (such as INSERT), not calls to stored procedures.
I need to be able to insert all the records, and rollback in case of any error.
Set autocommit off (which I recommend in general) and commit when all is OK.
If your stored procedures have no added value, but simply do the inserts, then it would be simpler and very efficient to do batched inserts. There are very good arguments for using stored procedures despite the extra complication, but then you would have to populate the arrays.
In every case, it is vital to use bind variables and not concatenate the values into the SQL statements (or calls to SPs) as literals.
Best regards, Stew Ashton
P.S. for 5k+ records, multi-threading is overkill.