To reproduce the issue:
docker run -e POSTGRES_USER=localtest -e POSTGRES_PASSWORD=localtest -e POSTGRES_DB=orders -p 5432:5432 -d postgres:15.3
Then run this code:
https://github.com/cuipengfei/Spikes/tree/master/jpa/lock-transaction-threads
@Controller
public class TestLockController {
private Logger logger = LoggerFactory.getLogger(TestLockController.class);
@Autowired
private TestLockService service;
@GetMapping("test-lock")
public ResponseEntity<String> testLock() {
logger.info("start");
Arrays.asList("a", "b", "c").parallelStream().forEach(key -> {
logger.info("going to call test lock method with key: {}", key);
service.testLock(key);
});
logger.info("end");
return ResponseEntity.ok("done");
}
}
When the /test-lock
api is called, it will use parallel stream to start new threads to run the testLock
method 3 time.
@Service
public class TestLockService {
private Logger logger = LoggerFactory.getLogger(TestLockService.class);
@Autowired
private JdbcLockRegistry jdbcLockRegistry;
@Transactional
public void testLock(String key) {
logger.info("{} start with transaction {}", key,
TransactionAspectSupport.currentTransactionStatus().hashCode());
boolean isLocked = false;
Lock lock = jdbcLockRegistry.obtain(key);
try {
pretendToDoWork(key);
// ↑ comment this out to see errors
isLocked = lock.tryLock(10, TimeUnit.SECONDS);
if (isLocked) {
logger.info("{} lock ok", key);
} else {
logger.info("{} lock failed", key);
}
} catch (Throwable t) {
logger.error("{} lock failed with exception: {}", key, t.getMessage());
} finally {
if (isLocked) {
logger.info("{} unlock", key);
lock.unlock();
} else {
logger.info("{} skip unlock since it was never locked", key);
}
}
logger.info("{} end with transaction {}", key,
TransactionAspectSupport.currentTransactionStatus().hashCode());
}
private void pretendToDoWork(String key) throws InterruptedException {
Random random = new Random();
int randomNumber = random.nextInt(3000);
logger.info("{} pretend to work for {} before try to lock", key, randomNumber);
Thread.sleep(randomNumber);
}
}
Then in the testLock
method of the service, it will try to get lock.
When I trigger this API, it prints the following log:
2023-12-22 21:49:26.188 INFO 20180 --- [nio-8080-exec-1] c.g.s.e.controllers.TestLockController : start
2023-12-22 21:49:26.189 INFO 20180 --- [nio-8080-exec-1] c.g.s.e.controllers.TestLockController : going to call test lock method with key: b
2023-12-22 21:49:26.189 INFO 20180 --- [nPool-worker-37] c.g.s.e.controllers.TestLockController : going to call test lock method with key: c
2023-12-22 21:49:26.190 INFO 20180 --- [nPool-worker-51] c.g.s.e.controllers.TestLockController : going to call test lock method with key: a
2023-12-22 21:49:26.208 INFO 20180 --- [nPool-worker-51] c.g.s.example.service.TestLockService : a start with transaction 1915798295
2023-12-22 21:49:26.208 INFO 20180 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService : b start with transaction 353593802
2023-12-22 21:49:26.208 INFO 20180 --- [nPool-worker-37] c.g.s.example.service.TestLockService : c start with transaction 1079367566
2023-12-22 21:49:26.209 INFO 20180 --- [nPool-worker-51] c.g.s.example.service.TestLockService : a pretend to work for 2609 before try to lock
2023-12-22 21:49:26.209 INFO 20180 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService : b pretend to work for 2217 before try to lock
2023-12-22 21:49:26.209 INFO 20180 --- [nPool-worker-37] c.g.s.example.service.TestLockService : c pretend to work for 918 before try to lock
2023-12-22 21:49:27.174 INFO 20180 --- [nPool-worker-37] c.g.s.example.service.TestLockService : c lock ok
2023-12-22 21:49:27.174 INFO 20180 --- [nPool-worker-37] c.g.s.example.service.TestLockService : c unlock
2023-12-22 21:49:27.182 INFO 20180 --- [nPool-worker-37] c.g.s.example.service.TestLockService : c end with transaction 1079367566
2023-12-22 21:49:28.444 INFO 20180 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService : b lock ok
2023-12-22 21:49:28.444 INFO 20180 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService : b unlock
2023-12-22 21:49:28.449 INFO 20180 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService : b end with transaction 353593802
2023-12-22 21:49:28.834 INFO 20180 --- [nPool-worker-51] c.g.s.example.service.TestLockService : a lock ok
2023-12-22 21:49:28.834 INFO 20180 --- [nPool-worker-51] c.g.s.example.service.TestLockService : a unlock
2023-12-22 21:49:28.838 INFO 20180 --- [nPool-worker-51] c.g.s.example.service.TestLockService : a end with transaction 1915798295
2023-12-22 21:49:28.838 INFO 20180 --- [nio-8080-exec-1] c.g.s.e.controllers.TestLockController : end
Then if I comment out this line:
pretendToDoWork(key);
// ↑ Comment this out to see errors!
Then run it again, it will print the following logs:
2023-12-22 21:36:30.192 INFO 39588 --- [nio-8080-exec-1] c.g.s.e.controllers.TestLockController : start
2023-12-22 21:36:30.192 INFO 39588 --- [nio-8080-exec-1] c.g.s.e.controllers.TestLockController : going to call test lock method with key: b
2023-12-22 21:36:30.193 INFO 39588 --- [nPool-worker-51] c.g.s.e.controllers.TestLockController : going to call test lock method with key: a
2023-12-22 21:36:30.193 INFO 39588 --- [nPool-worker-37] c.g.s.e.controllers.TestLockController : going to call test lock method with key: c
2023-12-22 21:36:30.210 INFO 39588 --- [nPool-worker-37] c.g.s.example.service.TestLockService : c start with transaction 344273392
2023-12-22 21:36:30.211 INFO 39588 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService : b start with transaction 2109875677
2023-12-22 21:36:30.210 INFO 39588 --- [nPool-worker-51] c.g.s.example.service.TestLockService : a start with transaction 1083949093
2023-12-22 21:36:30.241 INFO 39588 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService : b lock ok
2023-12-22 21:36:30.241 INFO 39588 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService : b unlock
2023-12-22 21:36:30.247 INFO 39588 --- [nio-8080-exec-1] c.g.s.example.service.TestLockService : b end with transaction 2109875677
2023-12-22 21:36:30.247 ERROR 39588 --- [nPool-worker-51] c.g.s.example.service.TestLockService : a lock failed with exception: Failed to lock mutex at 0cc175b9-c0f1-36a8-b1c3-99e269772661; nested exception is org.springframework.orm.jpa.JpaSystemException: Unable to commit against JDBC Connection; nested exception is org.hibernate.TransactionException: Unable to commit against JDBC Connection
2023-12-22 21:36:30.247 ERROR 39588 --- [nPool-worker-37] c.g.s.example.service.TestLockService : c lock failed with exception: Failed to lock mutex at 4a8a08f0-9d37-3737-9564-9038408b5f33; nested exception is org.springframework.orm.jpa.JpaSystemException: Unable to commit against JDBC Connection; nested exception is org.hibernate.TransactionException: Unable to commit against JDBC Connection
2023-12-22 21:36:30.247 INFO 39588 --- [nPool-worker-51] c.g.s.example.service.TestLockService : a skip unlock since it was never locked
2023-12-22 21:36:30.247 INFO 39588 --- [nPool-worker-37] c.g.s.example.service.TestLockService : c skip unlock since it was never locked
2023-12-22 21:36:30.247 INFO 39588 --- [nPool-worker-51] c.g.s.example.service.TestLockService : a end with transaction 1083949093
2023-12-22 21:36:30.247 INFO 39588 --- [nPool-worker-37] c.g.s.example.service.TestLockService : c end with transaction 344273392
2023-12-22 21:36:30.247 INFO 39588 --- [nio-8080-exec-1] c.g.s.e.controllers.TestLockController : end
Now I get errors.
Question is: there are 3 threads, each using a different transaction, each trying to get lock with a different key. They seem independent from each other. Why are they conflicting with each other when the pretendToDoWork
method is removed?
And if the above code is the wrong way to do it, what is the proper way to use the jdbc distributed lock in multiple threads using multiple transactions?
-----update 2023-12-25-----
code updated: https://github.com/cuipengfei/Spikes/tree/master/jpa/lock-transaction-threads/src/main/java/com/github/spring/example/service
Problem1Service.java: the issue described above
Problem1FixService.java: introduce a custom lock repo which uses its own separate trans manager
Problem2Service.java: based on problem 1 fix, run some sql before try to get lock, will get transaction isolation level change error due to running biz sql and lock in same transaction
Problem2BadFixService.java: try to fix problem 2 by making parallel stream not to use current thread, however this fix resutls in running biz sql outside of a trans
Problem2GoodFixService.java: try to fix problem 2 by narrowing down scope of trans and not mixing biz sql and lock in same trans
Does Problem2GoodFixService look like a proper way to use jdbc lock with multi-threads and @Transactional?
pretendToDoWork
method is removed?First of all, while the pretendToDoWork(key);
method is out of comment line; lock accesses occur at different times, but when the method is commented out: lock accesses occur simultaneously and you get an error.
To understand what's going on here; you can add t.printStackTrace();
to the catch block, or you can activate DEBUG
mode by adding logging.level.org.springframework.orm.jpa=DEBUG
to your application.properties
file.
The error you should pay attention to here is
org.postgresql.util.PSQLException: ERROR: could not serialize access due to read/write dependencies among transactions.
org.springframework.orm.jpa.JpaSystemException: Unable to commit against JDBC Connection; nested exception is org.hibernate.TransactionException: Unable to commit against JDBC Connection
at org.springframework.orm.jpa.vendor.HibernateJpaDialect.convertHibernateAccessException(HibernateJpaDialect.java:331) ~[spring-orm-5.3.28.jar:5.3.28]
...
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) ~[tomcat-embed-core-9.0.76.jar:9.0.76]
at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]
Caused by: org.hibernate.TransactionException: Unable to commit against JDBC Connection
at org.hibernate.resource.jdbc.internal.AbstractLogicalConnectionImplementor.commit(AbstractLogicalConnectionImplementor.java:92) ~[hibernate-core-5.6.15.Final.jar:5.6.15.Final]
at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.commit(JdbcResourceLocalTransactionCoordinatorImpl.java:282) ~[hibernate-core-5.6.15.Final.jar:5.6.15.Final]
at org.hibernate.engine.transaction.internal.TransactionImpl.commit(TransactionImpl.java:101) ~[hibernate-core-5.6.15.Final.jar:5.6.15.Final]
at org.springframework.orm.jpa.JpaTransactionManager.doCommit(JpaTransactionManager.java:562) ~[spring-orm-5.3.28.jar:5.3.28]
... 89 common frames omitted
Caused by: org.postgresql.util.PSQLException: ERROR: could not serialize access due to read/write dependencies among transactions
Detail: Reason code: Canceled on identification as a pivot, during commit attempt.
Hint: The transaction might succeed if retried.
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676) ~[postgresql-42.5.4.jar:42.5.4]
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366) ~[postgresql-42.5.4.jar:42.5.4]
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356) ~[postgresql-42.5.4.jar:42.5.4]
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:316) ~[postgresql-42.5.4.jar:42.5.4]
at org.postgresql.jdbc.PgConnection.executeTransactionCommand(PgConnection.java:928) ~[postgresql-42.5.4.jar:42.5.4]
at org.postgresql.jdbc.PgConnection.commit(PgConnection.java:950) ~[postgresql-42.5.4.jar:42.5.4]
at com.zaxxer.hikari.pool.ProxyConnection.commit(ProxyConnection.java:387) ~[HikariCP-4.0.3.jar:na]
at com.zaxxer.hikari.pool.HikariProxyConnection.commit(HikariProxyConnection.java) ~[HikariCP-4.0.3.jar:na]
at org.hibernate.resource.jdbc.internal.AbstractLogicalConnectionImplementor.commit(AbstractLogicalConnectionImplementor.java:86) ~[hibernate-core-5.6.15.Final.jar:5.6.15.Final]
... 92 common frames omitted
https://github.com/spring-projects/spring-integration/issues/3733 there are explanations about the error and why it occurs.
The problem was solved a few versions later with PR https://github.com/spring-projects/spring-integration/pull/3782.
(Example; you can upgrade spring-integration-jdbc
in your pom.xml
file to 6.0.9
or upgrade your Spring Boot
version.)
If you want to solve the problem manually in the current version; the solution in the title: How to retry a PostgreSQL serializable transaction with Spring? will work.