javacachingguava

Is get() thread-safe operation in Guava's cache?


I found out that put and get with CacheLoader operations use Reentrant lock under the hood, but why this is not implemented for getIfPresent operation?

get which is used by getIfPresent

@Nullable
        V get(Object key, int hash) {
            try {
                if (this.count != 0) {
                    long now = this.map.ticker.read();
                    ReferenceEntry<K, V> e = this.getLiveEntry(key, hash, now);
                    Object value;
                    if (e == null) {
                        value = null;
                        return value;
                    }

                    value = e.getValueReference().get();
                    if (value != null) {
                        this.recordRead(e, now);
                        Object var7 = this.scheduleRefresh(e, e.getKey(), hash, value, now, this.map.defaultLoader);
                        return var7;
                    }

                    this.tryDrainReferenceQueues();
                }

                Object var11 = null;
                return var11;
            } finally {
                this.postReadCleanup();
            }
        }

put

 @Nullable
        V put(K key, int hash, V value, boolean onlyIfAbsent) {
            this.lock();
           .....

Is the only thing I can do to reach thread-safety in basic get/put operations is to use synchronization on client ?


Solution

  • Even if getIfPresent did use locks, that won't help. It's more fundamental than that.

    Let me put that differently: Define 'threadsafe'.

    Here's an example of what can happen in a non-threadsafe implementation:

    That's an example of NOT thread safe.

    Here is an example of perfectly fine:

    That's fine. That's still thread safe. Thread safe doesn't mean every interaction with an instance of that data type can be understood in terms of 'first this thing happened, then that thing happened'. Wanting that is very problematic, because the only way the computer can really give you that kind of guarantee is to disable all but a single core and run everything very very slowly. The point of a cache is to speed things up, not slow things down!

    The problem with the lack of guarantees here is that if you run multiple separate operations on the same object, you run into trouble. Here's some pseudocode for a bank ATM machine that will go epically wrong in the long run:

    Everything perfectly threadsafe. And yet this is going to go very very wrong - if the user uses their card at the same time they are in the bank withdrawing money via the teller, either the bank or the user is going to get very lucky here. I'd hope it's obvious to see how and why.

    The upshot is: If you have dependencies between operations there is nothing you can do with 'threadsafe' concepts that can possibly fix it; the only way is to actually write code that explicitly marks off these dependencies.

    The only way to write that bank code is to either use some form of locking. Basic locking, or optimistic locking, either way is fine, but locking of some sort. It has to look like2:

    start some sort of transaction;
    fetch account balance;
    deal with insufficient funds;
    spit out cash;
    update account balance;
    end transaction;
    

    Now guava's code makes perfect sense:

    [1] Often log frameworks end up injecting an actual explicit lock in the proceedings and thus you do often get guarantees in this case, but only 'by accident' because of the log framework. This isn't a guarantee (maybe you're logging to separate log files, for example!) and often what you 'witness' may be a lie. For example, maybe you have 2 log statements that both log to separate files (and don't lock each other out at all), and they log the timestamp as part of the log. The fact that one log line says '12:00:05' and the other says '12:00:06' means nothing - the log thread fetches the current time, creates a string describing the message, and tells the OS to write it to the file. You obviously get absolutely no guarantee that the 2 log threads run at identical speed. Maybe one thread fetches the time (12:00:05), creates the string, wants to write to the disk but the OS switches to the other thread before the write goes through, the other thread is the other logger, it reads time (12:00:06), makes the string, writes it out, finishes up, and then the first logger continues, writes its context. Tada: 2 threads where you 'observe' one thread is 'earlier' but that is incorrect. Perhaps this example will further highlight why thinking about threads in terms of which one is 'first' steers you wrong.

    [2] This code has the additional complication that you're interacting with systems that cannot be transactional. The point of a transaction is that you can abort it; you cannot abort the user grabbing a bill from the ATM. You solve that by logging that you're about to spit out the money, then spit out the money, then log that you have spit out the money. And finally write to this log that it has been processed in the user's account balance. Other code needs to check this log and act accordingly. For example, on startup the bank's DB machine needs to flag 'dangling' ATM transactions and will have to get a human to check the video feed. This solves the problem where someone trips over the power cable of the bank DB machine juuust as the user is about to grab the banknote from the machine.