Writing an API in node.js with a MySQL database and I am implementing a fairly standard pattern of:
If exists then update
else insert
This of course works fine until multiple simultaneous requests are made to the SPI at which point the If exists
on request 2 can get executed before the insert of request 1, leading to two records instead of one.
I know that one way of dealing with this is ensure that the database has a constraint or key that prevents the duplicate record but in this case the rules that determine whether we should have an insert or update are more complicated and so the check needs to be done in code.
This sounded like a good case for using a mutex/lock. I need this to be distributed as the API may have multiple instances running as part of a pool/farm.
I've come up with the following implementation:
try {
await this.databaseConnection.knexRaw().raw(`SELECT GET_LOCK('lock1',10);`);
await this.databaseConnection.knexRaw().transaction(async (trx) => {
const existing = await this.findExisting(id);
if (existing) {
await this.update(myThing);
} else {
await this.insert(myThing);
}
});
} finally {
await this.databaseConnection.knexRaw().raw(`SELECT RELEASE_LOCK('lock1');`);
}
This all seems to work fine and my tests now produce only a single insert. Although it seems a bit brute force/manual. Being new to MySQL and node (I come from a C# and SQL Server background) is this approach sane? Is there a better approach?
Is it sane? Subjective.
Is it technically safe? It could be -- GET_LOCK()
is reliable -- but not as you have written it.
You are ignoring the return value of GET_LOCK()
, which is 1 if you got the lock, 0 if the timeout expired and you didn't get the lock, and NULL in some failure cases.
As written, you'll wait 10 seconds and then do the work anyway, so, not safe.
This assumes you have only one MySQL master. It wouldn't work if you have multiple masters or Galera, since Galera doesn't replicate GET_LOCK()
across all nodes. (A Galera cluster is a high availability MySQL/MariaDB/Percona cluster of writable masters that replicate synchronously and will survive the failure/isolation of up to (ceil(n/2) - 1) out of n total nodes).
It would be better to find and lock the relevant rows using SELECT ... FOR UPDATE
, which locks the found rows or, in some cases, the gap where they would be if they existed, blocking other transactions that are attempting to capture the same locks until you rollback or commit... but if that is not practical, using GET_LOCK()
is valid, subject to the point made above about the return value.