python-2.7dictionaryordereddict

group values of (ordered) dictionary based on condition within values


I have a sorted dictionary (orderedDict to be exact) and I want to make a new dictionary (or edit the old one) that group specific values based on a condition. The main problem is that I need to check each key in the dictionary with the previous ones and group (multiple) entries together until the condition is not true anymore. I know this i quite vague so here is an example

{ 
'5': ['a', 300, 350, 'name1'],
'98': ['a', 370, 450, 'name2'],
'115': ['a', 540, 600, 'name3'],
'7': ['a', 900, 960, 'name4'],
'12': ['a', 980, 1200, 'name5'],
'24': ['a', 2000, 2200, 'name6'],
'25': ['b', 100, 150, 'name7'],
'100': ['b', 190, 270, 'name8'],
'200': ['b', 280, 350, 'name9'],
'99': ['b', 370, 500, 'name10'],
'4': ['b', 980, 1200, 'name11']
}

here I want to compare the "end" value (the third value, here "350") with the "start" value of the next key (the second value, here "370") and additionally within either group a or b (i can also make 2 different dictionaries if that complicates it too much, there are only 2 groups anyway). If the difference between the end and start are for example <100 in difference, then add them together in a group. Keep adding, until the condition is not true anymore. I don't need all the values afterwards in my new dictionary. So a possible outcome might be.

{
'Group_1': ['a', 'name1; name2; name3'],
'Group_2': ['a', 'name4; name5'],
'Group_3': ['a', 'name6'],
'Group_4': ['b', 'name7; name8; name9; name10'],
'Group_5': ['b', 'name11']
}

I am really thinking about a solution, But the only thing I could come up with is looping with

for key[i], value[i] in sorted_dict.iteritems():
    check key[i] with key[i-1]
    if <100
        new_dict[counter] = [list of combined values]

but that seems very unlogical, because then you can only have a group of max 2 in length and I think would be very hard to program for different lengths. I also think that I should not do this in dictionaries at all, but my python knowledge is a bit short in how to solve this, quite simple in theory, but difficult to handle in datastructure problem.

I looked at this post, which is the most similar I could find on the web, but I think not really applicable to my case?

Any help would be appreciated. The ordereddict that i have is sorted first on the group(a or b) and then based on Start value.


Solution

  • The following code will produce an outcome similar to what you described, though I changed the datastructure to be more compact:

    sorted_dict = {
        '5': ['a', 300, 350, 'name1'],
        '98': ['a', 370, 450, 'name2'],
        '115': ['a', 540, 600, 'name3'],
        '7': ['a', 900, 960, 'name4'],
        '12': ['a', 980, 1200, 'name5'],
        '24': ['a', 2000, 2200, 'name6'],
        '25': ['b', 100, 150, 'name7'],
        '100': ['b', 190, 270, 'name8'],
        '200': ['b', 280, 350, 'name9'],
        '99': ['b', 370, 500, 'name10'],
        '4': ['b', 980, 1200, 'name11']
    }
    
    values = sorted(sorted_dict.values())
    result = {values[0][0] : [[values[0][3]]]}
    
    for list_1, list_2 in zip(values[:-1], values[1:]):
        if list_1[0] == list_2[0]:
            # assuming list_1[2] >= list_1[2], otherwise use abs(list_1[2] - list_1[2])
            if list_2[1] - list_1[2] < 100:
                result[list_1[0]][-1] += [list_2[3]]
            else:
                result[list_1[0]] += [[list_2[3]]]
        else:
            result[list_2[0]] = [[list_2[3]]]
    
    print(result)
    

    Result:

    > {'a': [['name1', 'name2', 'name3'], ['name4', 'name5'], ['name6']], 'b': [['name7', 'name8', 'name9', 'name10'], ['name11']]}