transactionsgoogle-cloud-spanner

Spanner - many smaller transactions vs fewer larger ones?


Context

I tried to seek the answer with search, but all what I found was for general SQL context, not specific from Spanner point of view.

There are like 150-200K of accounts overall.

There are 2 tables. One collects attempts of an action, and they are account-scoped. It will be not so much of them per account. Like up to 5 per a day. These records can be updated few times per day. Other table is used in like "read or append is not found" type of scenario, it describes some more repeating type of information across accounts and attempts of an action.

Question

What is better practice for Spanner perspective:

a) many single-action transactions? E. g. look for something in a read-only transactions, than add that missing record in different transaction?

b) fewer more-combined transactions? Like, seek for a thing and than add if not found it within same transaction?

In both cases data operation in question is small, single-row activity.

PS: I understand that it may sound as micro-optimization type of question, but as a sorta newbie I need an answer not for specific single situation, but more in "how it is better to do such things", like "what is best practice for this"? Or even "What is Spanner way of doing this"?


Solution

  • What is better practice for Spanner perspective:

    From which Spanner perspective? Consistency? Performance? Or something else?

    a) many single-action transactions? E. g. look for something in a read-only transactions, than add that missing record in different transaction?

    b) fewer more-combined transactions? Like, seek for a thing and than add if not found it within same transaction?

    The answer to this question really depends on your requirements as well:

    1. Assuming that you only care about what is the fastest: Then do all reads in a single read-only transaction. Buffer all rows that you need to insert/update. Then execute all buffered writes in a single read/write transaction. The downside of this, is that as the reads and the writes are spread across two transactions, which means that you have no transactional guarantees across those two transactions.
    2. Assuming that you don't want the additional complexity of first doing all reads and buffering everything that you need to write, and also don't need transactional consistency across your read and writes: Read a row in a single-use read-only transaction, and execute a single write (if any is needed) in a separate read/write transaction.
    3. Assuming that you want transactional consistency for your read and write of a row: Start a read/write transaction, read a row in that transaction, and then execute an insert/update in the same transaction. You should keep read/write transactions as short and quick as possible, as this reduces the amount of time that a row needs to be locked by Spanner.

    All the above three options can be considered 'the Spanner way of doing it', but once again depending on your requirements.

    What you should not do: Use a single, long-running read/write transaction to execute all reads and writes (unless your requirements include that the combination of all these reads and writes must be atomic). You should keep read/write transactions as short and quick as possible.