pythonpathsetnested-listsitertools-groupby

get group of sublists that share common elements but those elements are not the same in all sublists


I have to group sub-lists with a common element in set "p", using python. p elements are not always in 1st or las position on the sub-list. Grouped (result) lists is list r.

Result list r is build from and initial list l. 1st sub-list in r must contain starting point start, and intermediate sub-lists that are connected by elements from list p. The last sub-list in r must contain ending point end. Here a small example.

#starting point
start = 1

#ending point
end = 9

 #items to be used to link "start" and "end" points.
p = [2, 4, 5]

#complete nested list
l = [
    [0, 7],
    [1, 2],
    [8, 15, 19, 20],
    [0, 6],
    [2, 3, 5],
    [10, 14],
    [5, 8, 4, 3],
    [4, 9, 6],
    [14, 21],
    [20, 9]
]

#result list, with "start" in sublist 0, "end" in sublist 3.
r = [
    [1, 2],
#----^
    [2, 3, 5],
    [5, 8, 4, 3],
    [4, 9, 6]
#-------^
]

This is the code I tried to use to get list r.

#get routes with "start" element
ok_routes=[]
for i in range(len(l)):
       if start in l[i]:
        ok_routes.append((l[i]))



#get routes with "end" element
dk_routes=[]
for i in range(len(l)):
       if end in l[i]:
        dk_routes.append((l[i]))


#start r, appending "start" element
r_temp=[]
for i in range(len(ok_routes)):
    
    r_temp.append([ok_routes[i]])
    


#Using shuffle to generate al possible combinations of sublists
a_shuffle = copy.deepcopy(l) 
random.shuffle(a_shuffle)


#generate r using intersection of sets.
r=[]
for i in range(0,len(r_temp)):   
    for j in range(0,len(a_shuffle)):
        if set(p).intersection(a_shuffle[j]):
            r.append(a_shuffle[j])
    break

#r=[[2, 3, 5], [1, 2], [4, 9, 6], [5, 8, 4, 3]]

This code works sometimes and sometimes it does not. With more sub-lists with "start" and "end" points, "r" is not valid, because the sub-lists are not going to be fully "intersected" by elements in p, e.g: if sub-list [2,8] is added, I got this solution,

r = [[5, 8, 4, 3], [2, 8],[2, 3, 5], [1, 2], [4, 9, 6]]

and this solution is not valid because sub-list [2,8] must not be considered because "8" is not a start or ending point.

Thanks in advance.


Solution

  • Using networkx.shortest_path, which uses Dijkstra's algorithm by default.

    I suggest building a networkx Graph with one node per sublist in your list l, connected if they share an element from p, plus one node called 'start' and one node called 'end' which are connected to the possible start lists and end lists.

    import networkx as nx
    from itertools import combinations
    
    p = set([2, 4, 5])
    l = [ [0, 7], [1, 2], [8, 15, 19, 20], [0, 6], [2, 3, 5], [10, 14], [5, 8, 4, 3], [4, 9, 6], [14, 21], [20, 9] ]
    start, end = 1, 9
    
    cliques = {}
    possible_starts = []
    possible_ends = []
    for i,sublist in enumerate(l):
        for x in sublist:
            if x in p:
                cliques.setdefault(x, []).append(i)
            if x == start:
                possible_starts.append(i)
            if x == end:
                possible_ends.append(i)
    
    # print(cliques)
    # {2: [1, 4], 5: [4, 6], 4: [6, 7]}
    
    G = nx.Graph()
    for clique in cliques.values():
        G.add_edges_from(combinations(clique, 2))
    
    G.add_edges_from(('start', i) for i in possible_starts)
    G.add_edges_from((i, 'end') for i in possible_ends)
    
    path = nx.shortest_path(G, source='start', target='end')
    r = [l[i] for i in path[1:-1]]
    
    print(path)
    print(r)
    # ['start', 1, 4, 6, 7, 'end']
    # [[1, 2], [2, 3, 5], [5, 8, 4, 3], [4, 9, 6]]