performancegroovy

Performant comparison of two hashmaps


I'm performing in Groovy a compare of two hashmaps as in example below

 h1 = [key1 : [att1 : 1, att2 : 2], key2 : [att1 : 10, att2 : 20], key3 : [att1 : 3, att2 : 4]]
 h2 = [key1 : [att1 : 1, att2 : 3], key2 : [att1 : 10, att2 : 20]]

and I need the result as folows

 [key1:[att1:true, att2:false], key2:[att1:true, att2:true], key3:[att1:false, att2:false]]
 

I.e. if the attributes of a given key match they should be marked as true, if they do not match or the key is missing in the second source they should be set to false

I tried this nested closures,

 diff = h1.collectEntries {key, value -> [key,  (value.collectEntries {k, v -> [k, (v == (h2[key] ? h2[key][k] : null) ) ] } ) ] }

which works, but for large maps scale not well, is there a better performant solution in Groovy?

Note if this may be relavent, the attributes are same in all occurencies and known in advance.


Solution

  • I don't think there is much more you can do other than "compare the maps"; at best this is for testing and short-circuiting is an option. Or you can utilize a library that can build up a diff.

    That said, there are improvements to your code.

    Starting from h1 means, that you will not find differences in keys only in h2 (e.g. swap h1 and h2 and keys3 will vanish from the result). So finding the unions of all involved keys sets will solve that (see compareMaps, def keys).

    Next you can be very aggressive with the comparison for the final boolean, if you treat the following equally:

    (see the closure in compareMapValues)

    E.g.

    def compareMapValues(ks, m1, m2) {
        ks.collectEntries{ k -> [k, m1?.get(k)==m2?.get(k)] }
    }
    
    def compareMaps(m1, m2) {
        // according to OP known and could as well be passed in
        // so this is just to make the demo work
        def nestedKeys = [m1,m2].collectMany{ it.values().collectMany{ it.keySet() }}.toSet()
    
        def keys = [m1,m2].collectMany{ it.keySet() }.toSet()
        keys.collectEntries{ k -> [k, compareMapValues(nestedKeys, m1[k], m2[k])] }
    }
    
    
    def h1 = [key1 : [att1 : 1, att2 : 2], key2 : [att1 : 10, att2 : 20], key3 : [att1 : 3, att2 : 4]]
    def h2 = [key1 : [att1 : 1, att2 : 3], key2 : [att1 : 10, att2 : 20]]
    
    println(compareMaps(h1,h2))
    // → [key1:[att2:false, att1:true], key2:[att2:true, att1:true], key3:[att2:false, att1:false]]