I noticed that my applications often write values to a database that depend on a former read operation. A common example is a bank account where a user could deposit money:
void deposit(amount) {
balance = getAccountBalance()
setAccountBalance(balance + amount)
}
I want to avoid a race condition if this method is called by two threads/clients/ATMs simultaneously like this where the account owner would lose money:
balance = getAccountBalance() |
| balance = getAccountBalance()
setAccountBalance(balance + amount) |
| // balance2 = getAccountBalance() // theoretical
| setAccountBalance(balance + amount)
V
I often read that Repeatable Read or Serializable can solve this problem. Even the german Wikipedia article for Lost Updates states this. Translated to english:
The isolation level RR (Repeatable Read) is often mentioned as a solution to the lost update problem.
This SO answer suggests Serializable for a similar problem with INSERT after SELECT.
As far as I understood the idea - at the time the process on the right side tries to set the account balance, a (theoretical) reading operation wouldn't return the same balance anymore. Therefore the write operation is not allowed. And yes - if you read this popular SO answer, it actually sounds perfectly fitting:
under REPEATABLE READ the second SELECT is guaranteed to display at least the rows that were returned from the first SELECT unchanged. New rows may be added by a concurrent transaction in that one minute, but the existing rows cannot be deleted nor changed.
But then I wondered what "they cannot be deleted nor changed" actually means. What happens if you try to delete/change it anyway? Will you get an error? Or will your transaction wait until the first transaction finished and in the end also perform its update? This makes all the difference. In the second case you will still lose money.
And if you read the comments below it gets even worse, because there are other ways to meet the Repeatable Read conditions. For example a snapshot technology: A snapshot could be taken before the left side transaction writes its value and this allows to provide the original value if a second read occurs later in the right side transaction. See, for instance, the MySQL manual:
Consistent reads within the same transaction read the snapshot established by the first read
I came to the conclusion that restricting the transaction isolation level is probably the wrong tool to get rid of the race condition. If it solves the problem (for a specific DBMS), it's not due to the definition of Repeatable Read. Rather it's because of a specific implementation to fulfil the Repeatable Read conditions. For instance the usage of locks.
So, to me it looks like this: What you actually need to solve this issue is a locking mechanism. The fact that some DBMS use locks to implement Repeatable Read is exploited.
Is this assumption correct? Or do I have a wrong understanding of transaction isolation levels?
You might be annoyed, because this must be the millionth question about the topic. The problem is: The example bank account scenario is absolutely critical. Just there, where it should be absolutely clear what's going on, it seems to me as if there is so much misleading and contradictory information and misconceptions.
The problem here is that you are asking what Isolation Level, as defined by the SQL Standard, is needed to sort out a concurrency anomaly that is not part of this definition.
SQL Standard only defines how Isolation Levels (Read Uncommited
, Read Commited
, Repeatable Read
, Serializable
) map to Dirty Read
, Non-Repeatable Read
and Phantom Read
anomalies. No mention of Lost-Update
, so this - as you rightly pointed out - depends on how isolation levels are implemented by a specific DBMS.
Supposedly REPEATABLE_READ
is enough to prevent Lost-Update
on PostgreSQL and SERIALIZABLE
is needed to prevent it on MySQL and Oracle.
Here are some interesting posts about Oracle and PostgreSQL/MySQL