I have a Guava TreeMultimap
which maps Date
keys to some value.
I want to be able to filter this map
to find where the key is within a certain date range.
As the map
is sorted, it should be possible to do this very quickly without evaluating all of the keys, but when I first wrote this using Multimaps.filterKeys
it was much slower than expected, suggesting it was evaluating every key. When I rewrote this to use NavigableMap.subMap()
, performance was fantastic and where I expected it to be.
The Multimaps.filterKeys
syntax is much nicer and it's what I'd like to use, especially as I'm already working with a Multimap
.
Please see a minimised example of what I'm doing below:
import java.text.MessageFormat;
import java.util.Date;
import java.util.concurrent.TimeUnit;
import com.google.common.collect.Multimaps;
import com.google.common.collect.Range;
import com.google.common.collect.TreeMultimap;
public class MapPerformanceTest
{
public static void main(final String[] args)
{
System.out.println("Populating Map...");
final TreeMultimap<Date, Integer> map = TreeMultimap.create();
for (int i = 0; i < 20000000; i++)
{
map.put(new Date(i), i);
}
final Date[] range = {new Date(10), new Date(20)};
System.out.println("Map Populated");
System.out.println();
long tempTime = -System.nanoTime();
System.out.println(MessageFormat.format("Multimaps.filterKeys() attempt #1 returned {0} keys in {1} milliseconds",
Multimaps.filterKeys(map, Range.closed(range[0], range[1])).size(),
TimeUnit.MILLISECONDS.convert(tempTime + System.nanoTime(), TimeUnit.NANOSECONDS)));
tempTime = -System.nanoTime();
System.out.println(MessageFormat.format("NavigableMap.subMap() attempt #1 returned {0} keys in {1} milliseconds",
map.asMap().subMap(range[0], true, range[1], true).size(), TimeUnit.MILLISECONDS.convert(tempTime + System.nanoTime(), TimeUnit.NANOSECONDS)));
tempTime = -System.nanoTime();
System.out.println(MessageFormat.format("NavigableMap.subMap() attempt #2 returned {0} keys in {1} milliseconds",
map.asMap().subMap(range[0], true, range[1], true).size(), TimeUnit.MILLISECONDS.convert(tempTime + System.nanoTime(), TimeUnit.NANOSECONDS)));
tempTime = -System.nanoTime();
System.out.println(MessageFormat.format("Multimaps.filterKeys() attempt #2 returned {0} keys in {1} milliseconds",
Multimaps.filterKeys(map, Range.closed(range[0], range[1])).size(),
TimeUnit.MILLISECONDS.convert(tempTime + System.nanoTime(), TimeUnit.NANOSECONDS)));
}
}
The output is:
Multimaps.filterKeys() attempt #1 returned 11 keys in 1,418 milliseconds
NavigableMap.subMap() attempt #1 returned 11 keys in 1 milliseconds
NavigableMap.subMap() attempt #2 returned 11 keys in 0 milliseconds
Multimaps.filterKeys() attempt #2 returned 11 keys in 946 milliseconds
Your observation that filtering is slower than iterating a submap is correct. And the explanation is filtering does involve evaluating every key, as you suspect.
This is inherent to the filtering approach. The filterKeys
method is not able to look inside the supplied filter to determine what it does. It is therefore necessary to apply the filter to all of the keys.
Now if the compiler had deep knowledge of what the method and the filter are doing, it would be possible in theory for it to transform the filtering into something more efficient. Unfortunately it doesn't, so it is necessary to use the more cumbersome approach of submaps ... if you want good performance when operating on a map with many keys.