oracle-database plsql deadlock job-scheduling

Managing PL/SQL Jobs to Avoid Deadlocks on a Large Table

I have a PL/SQL procedure that submits jobs in a loop, processing different parameters (branches). Each job performs inserts and updates on a large table. Due to the size of the table, a table lock is required, and consequently, I frequently encounter deadlock errors, causing at least one job to fail each day.

Given this situation, I am seeking advice on how to manage these jobs to ensure they execute without deadlocks. Specifically:

How can I make each job wait for the previous job to finish before starting?

Is there a way to execute these jobs in parallel without running into deadlock issues, especially as the table size continues to increase?

Any guidance on improving the job execution strategy to avoid deadlocks would be greatly appreciated.

Solution

By "table lock" I hope you don't literally mean you are doing "lock table...". If you are doing this, remove that and you may solve your issue right there. Assuming that this isn't the case...

Oracle utilizes row-level locking (unless you have an Exadata with HCC compression enabled). This means that DML locks on a table are taken out on individual rows (there's a lock bit on every row in the data blocks). So, if you are getting lock waits, you have more than one job trying to lock the same row. The first rule of programming for concurrency is to divide your workload up in such as a way that no two concurrent jobs will ever need to work on the same row.

A deadlock is a situation where session A and session B lock different rows, then in the same transaction attempt to obtain a lock on the rows the other has already locked. This creates a circular dependency that will never resolve, so Oracle picks one unlucky session and kills it off. To reduce the chances of this, programmers should follow the following design principles:

Keep your transactions as short as possible. If you can, commit after every DML operation. Shorter transactions means much less opportunity for the circular lock chain to occur.
Operate on single rows at a time rather than on row sets. In a loop, update a row, commit, update the next row, commit, etc.. rather than update 1000 rows in a single statement. A deadlock can occur within a single DML if it locks multiple rows - the circular locking chain can develop as it's in the process of locking later rows while holding to locks it has on earlier ones already processed. So again, we want to hold that lock as briefly as possible and as atomically as possible. One transaction per row affected will do it.
If you have a sequence of operations within each job, ensure the sequence goes the same direction every time. If you touch table A then B then C, ensure all your concurrent jobs go in that same order. This is usually the case but check just to be sure.
Think carefully about foreign keys. Inserting a child row will lock its parent row. Deleting a parent row will take out a lock on the child table. If you are operating on both parent and child tables and have foreign keys in place, this is a situation ripe for deadlocks. If you can't solve it with programming changes and are okay without referential integrity protection, you may at some point even need to consider dropping the FK constraint.
Deadlocks can also develop because you have insufficient room in an index block or table block (less than 23 bytes) for Oracle to add another ITL slot to handle an extra concurrent transaction. Be sure you have enough ITL slots either by using a healthy PCTFREE (default of 10 is usually sufficient) for both tables and indexes and/or by setting INITRANS to a higher value. If I know I will have 10 threads working on the same table at the same time, I like to set INITRANS 10 on the table and on every index (you must move/rebuild for it to take effect) just to be sure there is never an ITL shortage.
There are other ways a deadlock can occur, more than I can survey here. When a deadlock does happen, Oracle generates a trace file that shows the locking chain. Ask your DBA to locate one of these trace files and provide it to you. This will help you figure out how you're getting into the deadlock and you can target your solution better.

Lastly,

If you are doing bulk processing rather than surgical OLTP-style row processing, remove the multithreaded model and use a single session/job to do all the work. This doesn't mean you have to sacrifice performance - if you utilize parallel query and PDML you can have one job leverage many CPUs worth of power, with very little risk of deadlocks (unless, again, it's an Exadata with HCC compression).