pythondictionarymergecombiners

Combine multiple dictionaries by keys and values?


After hours and suggestions from other posts, I could not solve my problem. I have to manage many dictionaries (until now the only way I know to work).

For all four dictionaries that I want to combine, three of them have the same key (d1, d2 and d3).

d1 = {key1: [x1, x2, x3], key2: [y1, y2, y3], key3: [z1, z2, z3]}
d2 = {key1: [x4, x5, x6],key2: [y4, y5, y6], key3: [z4, z5, z6]}
d3 = {key1: [x7, x8, x9], key2: [y7, y8, y9], key3: [z7, z8, z9]}
d4 = {x2: [a, b, c], y2: [d, e, f], z2: [g, h, i]}

The fourth dictionary is a dictionary generated by a reference file containing metadata and their keys are equal to one value in d1 and what I want is to create a dictionary with the informations from d1, d2 and d3 and after include the information of d4 in the final dictionary.

final_dict = {key1: [x1, a, b, x2, x3, x4, x5, x8, x9],
              key2: [y1, d, e, y2, y3, y4, y5, y8, y9],
              key3: [z1, g, h, z2, z3, z4, z5, z8, z9]}

and print like this in tabular format:

key1  x1  a  b  x2  x3  x4  x5  x8  x9
key2  y1  d  e  y2  y3  y4  y5  y8  y9
key3  z1  g  h  z2  z3  z4  z5  z8  z9

At the moment I have a dirty script but "works".

#!/usr/bin/env python

with open("file1.txt", "r") as file1, open("file2.txt", "r") as file2, /
     open("file3.txt", "r") as file3, open("file4.txt", "r") as file4:

    d1 = {}
    d2 = {}
    d3 = {}
    d4 = {}
    dicts = [d1, d2, d3, d4]

    #d1 = {key1: [x1, x2, x3], key2: [y1, y2, y3], key3: [z1, z2, z3]}
    #d2 = {key1: [x4, x5, x6],key2: [y4, y5, y6], key3: [z4, z5, z6]}
    #d3 = {key1: [x7, x8, x9], key2: [y7, y8, y9], key3: [z7, z8, z9]}
    #d4 = {x2: [a, b, c], y2: [d, e, f], z2: [g, h, i]}

    for b in file1:
        row = b.strip().split('\t')
        if row[0] not in d1:
            d1[row[0]] = row[1], row[3], row[4]

    for c in file2:
        row = c.strip().split('\t')
        if row[0] not in d2:
            d2[row[0]] = row[1:]

    for f in file3:
        row = f.strip().split('\t')
        if row[0] not in d3:
            d3[row[0]] = row[1:]

    for m in file4:
        row = m.strip().split('\t')
        if row[0] not in d4:
            d4[row[0]] = row[1], row[3], row[2]

    final_dict = {}
    for k in (dicts):
        for key, value in k.iteritems():
            final_dict[key].append(value)

    print final_dic

    #key1  x1  a  b  x2  x3  x4  x5  x8  x9
    #key2  y1  d  e  y2  y3  y4  y5  y8  y9
    #key3  z1  g  h  z2  z3  z4  z5  z8  z9

The problem is the last 3 lines.

Because of the absent of deep knowledge, simple suggestions (for dummies) will be appreciated.


Solution

  • I think this is what you're looking for, although the logic as to why variables like x6, x7, y6, y7, etc. were excluded is yet unclear:

    First, make these variables (e.g. x1, x2, etc.) exist, and assign their own name as string for their values for easier tracking of the results later:

    values = [letter + str(number) for letter in 'xyz' for number in range(1, 10)] + ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
    for v in values:
        exec('%s = "%s"' % (v, v))
    

    Next, let's instantiate your dictionaries:

    d1 = {key1: [x1, x2, x3], key2: [y1, y2, y3], key3: [z1, z2, z3]}  
    d2 = {key1: [x4, x5, x6],key2: [y4, y5, y6], key3: [z4, z5, z6]}
    d3 = {key1: [x7, x8, x9], key2: [y7, y8, y9], key3: [z7, z8, z9]}
    d4 = {x2: [a, b, c], y2: [d, e, f], z2: [g, h, i]}
    

    Then, let's merge the dictionaries into one, big, final dict:

    new_dict = {}
    for d in [d1, d2, d3]:
        for key in d:
            if key not in new_dict:
                # if key not yet in the dict, make it so
                new_dict[key] = d[key]
            else:
                # if key already there, then we'll just add the lists together
                new_dict[key] += d[key]
    

    And, finally, to get the first two individual letters from d4, we can try this:

    for key in new_dict:
        for other_key in d4:
            if other_key in new_dict[key]:
                new_dict[key] += d4[other_key][:2]
    

    Check the output:

    >>> new_dict
    {'key2': ['y1', 'y2', 'y3', 'y4', 'y5', 'y6', 'y7', 'y8', 'y9', 'd', 'e'], 'key3': ['z1', 'z2', 'z3', 'z4', 'z5', 'z6', 'z7', 'z8', 'z9', 'g', 'h'], 'key1': ['x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9', 'a', 'b']}
    

    This is basically your desired result, except it includes the 6's and 7's. Can you provide some background as to why your desired output looks that way? Anyway, this should get you started.