pythondictionarymergeddictionaries

How do I merge two dictionaries keeping max value against common keys?


I have two dictionary that look like:

{'r': 2, 'e': 4, 'h': 2, 'k': 4}

and

{'r': 2, 'e': 5, 'y': 2, 'h': 2}

how do I get a dictionary that has all the keys but incase there are keys in both initial dictionaries it keeps the higher value for that key? I want a dictionary that looks like this:

{'e': 5, 'k': 4, 'y': 2, 'h': 2, 'r': 2}

None of the previous answers helped me.


Solution

  • You can use itertools.chain to combine all values then itertools.groupby to get all the values for each individual key and just take the max of those values. You will need to sort the merged data before using groupby for it to work correctly though. Also I'm using operator.itemgetter to get the keys and values instead of lambdas so you could just replace them with lambdas if you don't want to import another library although I wouldn't advise it as it is slower and no real need to use them really.

    from itertools import chain, groupby
    from operator import itemgetter
    
    data1 = {'r': 2, 'e': 4, 'h': 2, 'k': 4}
    data2 = {'r': 2, 'e': 5, 'y': 2, 'h': 2}
    
    get_key, get_val = itemgetter(0), itemgetter(1)
    merged_data = sorted(chain(data1.items(), data2.items()), key=get_key)
    
    output = {k: max(map(get_val, g)) for k, g in groupby(merged_data, key=get_key)}
    
    print(output)
    

    {'e': 5, 'h': 2, 'k': 4, 'r': 2, 'y': 2}
    

    Another alternative here is collections.defaultdict and to ensure you always get the correct output to include if there are negative values use float('-inf') as the default value:

    from collections import defaultdict
    
    output = defaultdict(lambda: float('-inf'))
    
    for d in (data1, data2):
        for k, v in d.items():
            output[k] = max(output[k], v)
    
    print(dict(output))
    

    {'r': 2, 'e': 5, 'h': 2, 'k': 4, 'y': 2}
    

    Or without any imports dict.setdefault can basically take the place of defaultdict:

    output = {}
    
    for d in (data1, data2):
        for k, v in d.items():
            output.setdefault(k, float('-inf'))
            output[k] = max(output[k], v)
            
    print(output)
    

    {'r': 2, 'e': 5, 'h': 2, 'k': 4, 'y': 2}
    

    Lastly, using pandas

    import pandas as pd
    
    data1 = {'r': 2, 'e': 4, 'h': 2, 'k': 4}
    data2 = {'r': 2, 'e': 5, 'y': 2, 'h': 2}
        
    res = pd.concat(map(pd.DataFrame, ([data1], [data2]))).max().astype(int).to_dict()