pythoncollectionsgrouping

How can I group an array of tuplets by common elements?


I have a series of segments defined as an array of two tuplets (two vertices). Each segment can share a vertex with another segment and it is possible that a combination of segments will form one or more closed loops.

In order to find the closed loops my current approach would be to find which lines are connected (share a vertex) and group them before ordering each group clockwise.

Let's take the following example (written in Python)

lines = {
    "A": [(0, 0), (0, 1)],  # A,B -- B,C
    "B": [(0, 1), (1, 1)],  # |       |
    "C": [(1, 1), (1, 0)],  # |       |
    "D": [(1, 0), (0, 0)],  # A,D -- C,D
    
    "E": [(4, 4), (4, 5)],  # E,F - F,G
    "F": [(4, 5), (5, 5)],  # |   /
    "G": [(5, 5), (4, 4)],  # E,G
}

answer = ["ABCD", "EFG"]

What would be the best way to group the entries (using the standard Python library) that share a vertex (tuplet value)? The order in which the lines are found in the dictionary is not guaranteed but ordered clockwise here to make it easier to read.

I have tried a simple cycle approach (graph theory) to this problem but the cycles returned included many more than I wanted using the vertices in the example above so I have revisited my approach to the problem which resulted in me asking this question.


Solution

  • Since you asked for a solution that only involves the standard library, here's one.

    1. Create an adjacency list, where each key is a vertex and the list of values is a list of the edges for that vertex
    from collections import defaultdict
    
    adjacency_list = defaultdict(list)
    for key, (v1, v2) in lines.items():
        adjacency_list[v1].append((v2, key))
        adjacency_list[v2].append((v1, key))
    
    print(adjacency_list) # {(0, 0): [((0, 1), 'A'), ((1, 0), 'D')], ...}
    
    1. Traverse the graph using depth-first search
    def depth_first_search(node, visited, adjacency_list, path):
        """Traverse the graph to find connected components"""
        stack = [(node, None)]
        while stack:
            v, parent = stack.pop()
            if v in visited:
                continue
            visited.add(v)
            path.append(v)
            for neighbor, line in adjacency_list[v]:
                if neighbor not in visited:
                    stack.append((neighbor, v))
        return path
    
    visited = set()
    
    filtered_list = filter(lambda node: node not in visited, adjacency_list)
    components = map(lambda node: depth_first_search(node, visited, adjacency_list, []), filtered_list)
    
    1. Find the loops from the components
    def find_loops(component, adjacency_list):
        if not component:
            return ""
        
        # Find the loop by following the path
        loop = []
        visited_edges = set()
        current_node = component[0]
        
        while True:
            for neighbor, line in adjacency_list[current_node]:
                if current_node < neighbor:
                  edge = (current_node, neighbor)
                else:
                  edge = (neighbor, current_node)
                if edge not in visited_edges:
                    visited_edges.add(edge)
                    loop.append(line)
                    current_node = neighbor
                    break
            else:
                break
        
        return "".join(sorted(loop))
    
    answer = [find_loops(component, adjacency_list) for component in components]
    
    print(answer) # ['ABCD', 'EFG']
    

    If the code is not clear enough, I can explain it some more.