javaspringspring-integrationdistributed-lockspring-integration-jdbc

time to live of DefaultLockRepository has no effect for 2 threads in the same java process when using JdbcLockRegistry


The following code can reproduce the issue

https://github.com/cuipengfei/Spikes/blob/master/jpa/spring-jdbc-distributed-lock-issue/

run the test cases in the above code to reproduce

before running the test case, start pg db in docker:

docker run -e POSTGRES_USER=localtest -e POSTGRES_PASSWORD=localtest -e POSTGRES_DB=orders -p 5432:5432 -d postgres:9.6.12

enter image description here

when running 2 workers in 2 separate java processes, TTL works as expected ↑

enter image description here

when running both workers in one java process, TTL has no effect ↑

Aside from the above code, the real issue I was facing is like this:

request 1 hits server 1, while handling the request, the thread sometimes hangs(due to a weird issue of a 3rd party jar which we can not replace) when that hanging happens, our code won't have a chance to release the lock

then request 2 comes, if it hits other servers, then it's fine. but if it hits server 1, then request 2 won't be able to get the lock since the hanging thread never released the lock and TTL does not help in this case.

In summary: In one java process, a thread gets the lock but somehow due to bugs or whatever reason it did not get a chance to release the lock. Then subsequent threads in this java process won't be able to obtain the same lock and can not proceed with unfinished job. In this case, are there any recommended ways to allow the subsequent threads to be able to get the lock?


Solution

  • When you talk about one process (and even different threads), you still ask same repository instance and second request is aware about the first. When we deal with several repository instances (different processes in your meaning), they are not aware of each other, and we can avoid dead lock only via TTL (expire) property. This is typical approach in distributed systems to let other instances to take the data when locked instance might be already dead or crashed, so it could not unlock.

    This way we recommend a TTL long enough to let the logic in program to be performed, but so reasonable that other instances won’t park for nothing too long.

    In other words, the time-to-live is, essentially, an emergency procedure: the locking process might be still running when we take the lock in other process because of expiration. Therefore it has to be a bit longer than we keep the lock in our program.

    You can look into a JdbcLockRegistry.expireUnusedOlderThan() though, but just imaging what would you do if you wouldn't use this JdbcLockRegistry, but regular ReentrantLock since you really talk about a single JVM problem.