I have two dictionary that look like:
{'r': 2, 'e': 4, 'h': 2, 'k': 4}
and
{'r': 2, 'e': 5, 'y': 2, 'h': 2}
how do I get a dictionary that has all the keys but incase there are keys in both initial dictionaries it keeps the higher value for that key? I want a dictionary that looks like this:
{'e': 5, 'k': 4, 'y': 2, 'h': 2, 'r': 2}
None of the previous answers helped me.
You can use itertools.chain
to combine all values then itertools.groupby
to get all the values for each individual key and just take the max of those values. You will need to sort the merged data before using groupby for it to work correctly though. Also I'm using operator.itemgetter
to get the keys and values instead of lambdas so you could just replace them with lambdas if you don't want to import another library although I wouldn't advise it as it is slower and no real need to use them really.
from itertools import chain, groupby
from operator import itemgetter
data1 = {'r': 2, 'e': 4, 'h': 2, 'k': 4}
data2 = {'r': 2, 'e': 5, 'y': 2, 'h': 2}
get_key, get_val = itemgetter(0), itemgetter(1)
merged_data = sorted(chain(data1.items(), data2.items()), key=get_key)
output = {k: max(map(get_val, g)) for k, g in groupby(merged_data, key=get_key)}
print(output)
{'e': 5, 'h': 2, 'k': 4, 'r': 2, 'y': 2}
Another alternative here is collections.defaultdict
and to ensure you always get the correct output to include if there are negative values use float('-inf')
as the default value:
from collections import defaultdict
output = defaultdict(lambda: float('-inf'))
for d in (data1, data2):
for k, v in d.items():
output[k] = max(output[k], v)
print(dict(output))
{'r': 2, 'e': 5, 'h': 2, 'k': 4, 'y': 2}
Or without any imports dict.setdefault
can basically take the place of defaultdict
:
output = {}
for d in (data1, data2):
for k, v in d.items():
output.setdefault(k, float('-inf'))
output[k] = max(output[k], v)
print(output)
{'r': 2, 'e': 5, 'h': 2, 'k': 4, 'y': 2}
Lastly, using pandas
import pandas as pd
data1 = {'r': 2, 'e': 4, 'h': 2, 'k': 4}
data2 = {'r': 2, 'e': 5, 'y': 2, 'h': 2}
res = pd.concat(map(pd.DataFrame, ([data1], [data2]))).max().astype(int).to_dict()