javaguavajmhgoogle-guava-cache

Iterating on values from Guava Cache loses data


I started benchmarking ways to find key by value in Guava cache and I noticed strange behaviour related to concurrency level. I'm not sure if that's a bug or undefined behaviour or maybe even expected but not specified.

My benchmark is supposed to find key by value in Guava Cache, not an usual thing to do, I know.

That's my complete benchmark class:

@Fork(4)
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Warmup(iterations = 1, time = 100, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 4, time = 100, timeUnit = TimeUnit.MILLISECONDS)
public class ValueByKey {

    private Long counter = 0L;

    private final int MAX = 2500;

    private final LoadingCache<String, Long> stringToLong = CacheBuilder.newBuilder()
        .concurrencyLevel(1)
        .maximumSize(MAX + 5)
        .build(new CacheLoader<String, Long>() {
            public Long load(String mString) {
                return generateIdByString(mString);
            }
        });

    private final Map<String, Long> mHashMap = new Hashtable<>(MAX);
    private final Map<String, Long> concurrentHashMap = new ConcurrentHashMap<>(MAX);

    @Setup(Level.Trial)
    public void setup() {
        // Populate guava cache
        for(int i = 0; i <= MAX; i++) {
            try {
                stringToLong.get(UUID.randomUUID().toString());
            } catch (ExecutionException e) {
                e.printStackTrace();
                System.exit(1);
            }
        }
    }

    @Benchmark
    public String stringToIdByIteration() {
        Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);

        for(Map.Entry<String, Long> entry : stringToLong.asMap().entrySet()) {
            if(Objects.equals(randomNum, entry.getValue())) {
                return entry.getKey();
            }
        }
        System.out.println("Returning null as value not found " + randomNum);
        return null;
    }

    @Benchmark
    public String stringToIdByIterationHashTable() {
        Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);

        for(Map.Entry<String, Long> entry : mHashMap.entrySet()) {
            if(Objects.equals(randomNum, entry.getValue())) {
                return entry.getKey();
            }
        }
        System.out.println("Returning null as value not found " + randomNum);
        return null;
    }

@Benchmark
    public String stringToIdByIterationConcurrentHashMap() {
        Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);

        for(Map.Entry<String, Long> entry : concurrentHashMap.entrySet()) {
            if(Objects.equals(randomNum, entry.getValue())) {
                return entry.getKey();
            }
        }
        System.out.println("concurrentHashMap Returning null as value not found " + randomNum);
        return null;
    }

    private Long generateIdByString(final String mString) {
        mHashMap.put(mString, counter++);
        concurrentHashMap.put(mString, counter);
        return counter;
    }

}

What I noticed is when I change .concurrencyLevel(1) to number different than 1 I'm starting to lose data. The following output is from concurrency level 4:

Iteration   1: Returning null as value not found 107
Returning null as value not found 43
Returning null as value not found 20
Returning null as value not found 77
Returning null as value not found 127
Returning null as value not found 35
Returning null as value not found 83
Returning null as value not found 43
Returning null as value not found 127
Returning null as value not found 107
Returning null as value not found 83
Returning null as value not found 82
Returning null as value not found 40
Returning null as value not found 58
Returning null as value not found 127
Returning null as value not found 114
Returning null as value not found 119
Returning null as value not found 43
Returning null as value not found 114
Returning null as value not found 18
Returning null as value not found 58
66.778 us/op

I noticed I never lose any data when using HashMap or HashTable for using the same code, it also performs much better:

Benchmark Mode Cnt Score Error Units ValueByKey.stringToIdByIteration avgt 16 58.637 ± 15.094 us/op ValueByKey.stringToIdByIterationConcurrentHashMap avgt 16 16.148 ± 2.046 us/op ValueByKey.stringToIdByIterationHashTable avgt 16 11.705 ± 1.095 us/op

Is my code wrong or is that Guava can't properly handle partitioned HashTable with concurrency level higher than 1?

  • The concurrency level option is used to partition the table internally such that updates can occur without contention.
  • The ideal setting would be the maximum number of threads that could potentially access the cache at one time.

Solution

  • No cache guarantees cache-hits all of the time

    Presence/absence of data in cache is determined by eviction policy (and data being loaded into cache in the first place).

    Since you have used CacheBuilder.maximumSize(MAX + 5) your cache will use the size-based eviction and will start removing elements before it reaches the preset maximum size.

    With concurrency level set to 4, Guava Cache plays it safe and and sets the threshold of eviction a bit lower, which makes sense as the elements can keep arriving as they are being evicted.

    This is why your elements start 'disappearing'.

    To test this, make your class implement RemovalListener interface:

    public class ValueByKey implements RemovalListener<String, Long> { 
        //...
        @Override
        public void onRemoval(RemovalNotification<String, Long> notification) {
            System.out.println("removed: " + notification.getKey() + " -> " + notification.getValue());
        }
        //...
    }
    

    ...and while running the tests, you will notice evictions which match missing values:

    # Warmup Iteration   1: 
    removed: 110c0a73-1dc3-40ee-8909-969e6dee0ea0 -> 3
    removed: 6417015a-f154-467f-b3bf-3b95831ac5b7 -> 6
    removed: 5bc206f9-67ec-49a2-8471-b386ffc03988 -> 14
    removed: 3c0a33e1-1fe1-4e42-b262-bf6a3e8c53f7 -> 21
    Returning null as value not found 14
    Returning null as value not found 14
    Returning null as value not found 3
    64.778 us/op
    Iteration   1: 
    Returning null as value not found 21
    Returning null as value not found 21
    Returning null as value not found 6
    37.719 us/op
    [...]
    

    I can imagine that the threshold calculation for eviction could be complex, but on my machine upping the maximum size by 5% (CacheBuilder.maximumSize(Math.round(MAX * 1.05))) prevented ALL evictions when running your benchmarks.