javaperformanceguavamultimapsortedmap

Guava Multimaps.filterKeys poor performance vs NavigableMap.subMap()


I have a Guava TreeMultimap which maps Date keys to some value.

I want to be able to filter this map to find where the key is within a certain date range.

As the map is sorted, it should be possible to do this very quickly without evaluating all of the keys, but when I first wrote this using Multimaps.filterKeys it was much slower than expected, suggesting it was evaluating every key. When I rewrote this to use NavigableMap.subMap(), performance was fantastic and where I expected it to be.

The Multimaps.filterKeys syntax is much nicer and it's what I'd like to use, especially as I'm already working with a Multimap.

Please see a minimised example of what I'm doing below:

import java.text.MessageFormat;
import java.util.Date;
import java.util.concurrent.TimeUnit;
import com.google.common.collect.Multimaps;
import com.google.common.collect.Range;
import com.google.common.collect.TreeMultimap;

public class MapPerformanceTest
{
    public static void main(final String[] args)
    {
        System.out.println("Populating Map...");

        final TreeMultimap<Date, Integer> map = TreeMultimap.create();

        for (int i = 0; i < 20000000; i++)
        {
            map.put(new Date(i), i);
        }

        final Date[] range = {new Date(10), new Date(20)};

        System.out.println("Map Populated");
        System.out.println();

        long tempTime = -System.nanoTime();

        System.out.println(MessageFormat.format("Multimaps.filterKeys() attempt #1 returned {0} keys in {1} milliseconds",
                Multimaps.filterKeys(map, Range.closed(range[0], range[1])).size(),
                TimeUnit.MILLISECONDS.convert(tempTime + System.nanoTime(), TimeUnit.NANOSECONDS)));

        tempTime = -System.nanoTime();

        System.out.println(MessageFormat.format("NavigableMap.subMap() attempt #1 returned {0} keys in {1} milliseconds",
                map.asMap().subMap(range[0], true, range[1], true).size(), TimeUnit.MILLISECONDS.convert(tempTime + System.nanoTime(), TimeUnit.NANOSECONDS)));

        tempTime = -System.nanoTime();

        System.out.println(MessageFormat.format("NavigableMap.subMap() attempt #2 returned {0} keys in {1} milliseconds",
                map.asMap().subMap(range[0], true, range[1], true).size(), TimeUnit.MILLISECONDS.convert(tempTime + System.nanoTime(), TimeUnit.NANOSECONDS)));

        tempTime = -System.nanoTime();

        System.out.println(MessageFormat.format("Multimaps.filterKeys() attempt #2 returned {0} keys in {1} milliseconds",
                Multimaps.filterKeys(map, Range.closed(range[0], range[1])).size(),
                TimeUnit.MILLISECONDS.convert(tempTime + System.nanoTime(), TimeUnit.NANOSECONDS)));
    }
}

The output is:

Multimaps.filterKeys() attempt #1 returned 11 keys in 1,418 milliseconds
NavigableMap.subMap() attempt #1 returned 11 keys in 1 milliseconds
NavigableMap.subMap() attempt #2 returned 11 keys in 0 milliseconds
Multimaps.filterKeys() attempt #2 returned 11 keys in 946 milliseconds

Solution

  • Your observation that filtering is slower than iterating a submap is correct. And the explanation is filtering does involve evaluating every key, as you suspect.

    This is inherent to the filtering approach. The filterKeys method is not able to look inside the supplied filter to determine what it does. It is therefore necessary to apply the filter to all of the keys.

    Now if the compiler had deep knowledge of what the method and the filter are doing, it would be possible in theory for it to transform the filtering into something more efficient. Unfortunately it doesn't, so it is necessary to use the more cumbersome approach of submaps ... if you want good performance when operating on a map with many keys.