How can I avoid database deadlocks when using closure_tree to concurrently manipulate a set of models with common attributes on a hierarchical structure?
They present in the following flavors:
When issuing an #append/prepend_sibling
Mysql2::Error: Deadlock found when trying to get lock; try restarting transaction:
UPDATE `elements` SET `sort_order` = `sort_order` + 1 WHERE (`parent_id` = 28035 AND `sort_order` >= 1)
Mysql2::Error: Deadlock found when trying to get lock; try restarting transaction:
UPDATE `elements` SET `sort_order` = `sort_order` - 1 WHERE (`parent_id` = 21168 AND `sort_order` <= -1)
When rebuilding the closure table
Mysql2::Error: Deadlock found when trying to get lock; try restarting transaction:
DELETE FROM `element_hierarchies`
WHERE descendant_id IN (
SELECT DISTINCT descendant_id
FROM ( SELECT descendant_id
FROM `element_hierarchies`
WHERE ancestor_id = 16332
) AS x )
OR descendant_id = 16332
Mysql2::Error: Deadlock found when trying to get lock; try restarting transaction:
INSERT INTO `element_hierarchies` (`ancestor_id`, `descendant_id`, `generations`) VALUES (30910, 30910, 0)
with_advisory_lock looks promising. Any thoughts?
Author of closure_tree here:
Mr. Heal's advice is generally correct—you should acquire table locks in the same order to prevent deadlocks. In this case, though, the deadlock is due to row-level locks in the hierarchy table.
Interesting that you suggest using with_advisory_lock! I just wrote that library for closure_tree, and if you're using version >= 3.7.0, there are already advisory locks safeguarding the class-level #rebuild
and #find_or_create_by_path
methods.
The problem with advisory locks (at least with MySQL and PostgreSQL) is that they don't respect transaction boundaries—if the lock-holding caller doesn't commit their transaction before the lock is released, other connections will not see those changes when they try to acquire the advisory lock, so we need to be careful here. We may need to add a table lock on the hierarchy table for any writes, but that would be worst-case.
I've opened issue 41, and we can track it there. First thing to do is to reliably reproduce the deadlock in a parallel test. We already have tests that do that for #rebuild
and #find_or_create_by_path
.