I found out that put and get with CacheLoader operations use Reentrant lock under the hood, but why this is not implemented for getIfPresent operation?
get which is used by getIfPresent
@Nullable
V get(Object key, int hash) {
try {
if (this.count != 0) {
long now = this.map.ticker.read();
ReferenceEntry<K, V> e = this.getLiveEntry(key, hash, now);
Object value;
if (e == null) {
value = null;
return value;
}
value = e.getValueReference().get();
if (value != null) {
this.recordRead(e, now);
Object var7 = this.scheduleRefresh(e, e.getKey(), hash, value, now, this.map.defaultLoader);
return var7;
}
this.tryDrainReferenceQueues();
}
Object var11 = null;
return var11;
} finally {
this.postReadCleanup();
}
}
put
@Nullable
V put(K key, int hash, V value, boolean onlyIfAbsent) {
this.lock();
.....
Is the only thing I can do to reach thread-safety in basic get/put operations is to use synchronization on client ?
Even if getIfPresent
did use locks, that won't help. It's more fundamental than that.
Let me put that differently: Define 'threadsafe'.
Here's an example of what can happen in a non-threadsafe implementation:
.put
on a plain jane j.u.HashMap
, not holding any locks..get(k)
on that map with the second thread's key doesn't find it eventhough it is returned in the .entrySet()
. This makes no sense and breaks all rules of j.u.HashMap. The spec of hashmap does not explain any of this, other than 'I am not threadsafe' and leaves it at that.That's an example of NOT thread safe.
Here is an example of perfectly fine:
That's fine. That's still thread safe. Thread safe doesn't mean every interaction with an instance of that data type can be understood in terms of 'first this thing happened, then that thing happened'. Wanting that is very problematic, because the only way the computer can really give you that kind of guarantee is to disable all but a single core and run everything very very slowly. The point of a cache is to speed things up, not slow things down!
The problem with the lack of guarantees here is that if you run multiple separate operations on the same object, you run into trouble. Here's some pseudocode for a bank ATM machine that will go epically wrong in the long run:
Map<Account, Integer>
(maps account ID to cents in account)..put(acct, balance - 5000)
.Everything perfectly threadsafe. And yet this is going to go very very wrong - if the user uses their card at the same time they are in the bank withdrawing money via the teller, either the bank or the user is going to get very lucky here. I'd hope it's obvious to see how and why.
The upshot is: If you have dependencies between operations there is nothing you can do with 'threadsafe' concepts that can possibly fix it; the only way is to actually write code that explicitly marks off these dependencies.
The only way to write that bank code is to either use some form of locking. Basic locking, or optimistic locking, either way is fine, but locking of some sort. It has to look like2:
start some sort of transaction;
fetch account balance;
deal with insufficient funds;
spit out cash;
update account balance;
end transaction;
Now guava's code makes perfect sense:
There is no such thing as 'earlier' and 'later'. You need to stop thinking about multicore in that way. Unless you explicitly write primitives that establish these things. The cache interface does have these. Use the right operation! getIfPresent
will get you the cache if it is possible for your current thread to get at that data. If it is not, it returns null
, that's what that call does.
If instead you want this common operation: "Get me the cached value. However, if it is not available, then run this code to calculate the cached value, cache the result, and return it to me. In addition, ensure that if 2 threads simultaneously end up running this exact operation, only one thread runs the calculation, and the other will wait for the other one (don't say 'first' one, that's not how you should think about threads) to finish, and use that result instead".. then, use the right call for that: .cache.get(key, k -> calculateValueForKey(k))
. As the docs explicitly call out this will wait for another thread that is also 'loading' the value (that's what guava cache calls the calculation process).
No matter what you invoke from the Cache API, you can't 'break it', in the sense that I broke that HashMap. The cache API does this partly by using locks (such as ReentrantLock
for mutating operations on it), and partly by using a ConcurrentHashMap
under the hood.
[1] Often log frameworks end up injecting an actual explicit lock in the proceedings and thus you do often get guarantees in this case, but only 'by accident' because of the log framework. This isn't a guarantee (maybe you're logging to separate log files, for example!) and often what you 'witness' may be a lie. For example, maybe you have 2 log statements that both log to separate files (and don't lock each other out at all), and they log the timestamp as part of the log. The fact that one log line says '12:00:05' and the other says '12:00:06' means nothing - the log thread fetches the current time, creates a string describing the message, and tells the OS to write it to the file. You obviously get absolutely no guarantee that the 2 log threads run at identical speed. Maybe one thread fetches the time (12:00:05), creates the string, wants to write to the disk but the OS switches to the other thread before the write goes through, the other thread is the other logger, it reads time (12:00:06), makes the string, writes it out, finishes up, and then the first logger continues, writes its context. Tada: 2 threads where you 'observe' one thread is 'earlier' but that is incorrect. Perhaps this example will further highlight why thinking about threads in terms of which one is 'first' steers you wrong.
[2] This code has the additional complication that you're interacting with systems that cannot be transactional. The point of a transaction is that you can abort it; you cannot abort the user grabbing a bill from the ATM. You solve that by logging that you're about to spit out the money, then spit out the money, then log that you have spit out the money. And finally write to this log that it has been processed in the user's account balance. Other code needs to check this log and act accordingly. For example, on startup the bank's DB machine needs to flag 'dangling' ATM transactions and will have to get a human to check the video feed. This solves the problem where someone trips over the power cable of the bank DB machine juuust as the user is about to grab the banknote from the machine.