pythonarraysnumpyscikit-learnnearest-neighbor

Nearest neighbor for list of arrays


`I have a list of arrays like this (in x, y coordinates):

coordinates= array([[ 300, 2300],
      [ 670, 2360],
      [ 400, 2300]]), array([[1500, 1960],
      [1620, 2200],
      [1505, 1975]]), array([[ 980, 1965],
      [1060, 2240],
      [1100, 2250],
      [ 980, 1975]]), array([[ 565, 1940],
      [ 680, 2180],
      [ 570, 1945]])]

I want the arrays to be sorted by the nearest neighbor. So, the last coordinate of the first array should be close to the first coordinate of the next array and so forth.

I extracted all of the first and last coordinates and put them in two lists using python numpy. Then tried to use sklearn.neighbors NearestNeighbors, but it didn't work.`

expected output:

coordinates= array([[ 300, 2300],
      [ 670, 2360],
      [ 400, 2300]]), array([[ 565, 1940],
      [ 680, 2180],
      [ 570, 1945]]), array([[ 980, 1965],
      [1060, 2240],
      [1100, 2250],
      [ 980, 1975]]), array([[1500, 1960],
      [1620, 2200],
      [1505, 1975]])]

Solution

  • This looks like a traveling_salesman_problem. Compute the pairwise distance between all nodes, make the self distance NaN/Inf, then get the shortest path:

    from numpy import array
    import networkx as nx
    
    coordinates= [array([[ 300, 2300],
                         [ 670, 2360],
                         [ 400, 2300]]),
                  array([[1500, 1960],
                         [1620, 2200],
                         [1505, 1975]]),
                  array([[ 980, 1965],
                         [1060, 2240],
                         [1100, 2250],
                         [ 980, 1975]]),
                  array([[ 565, 1940],
                         [ 680, 2180],
                         [ 570, 1945]])]
    
    # extract starts and ends
    starts = np.vstack([a[0]  for a in coordinates])
    ends   = np.vstack([a[-1] for a in coordinates])
    
    # compute pairwise distances, except self
    dist = np.sqrt(((starts[:, None] - ends)**2).sum(-1))
    np.fill_diagonal(dist, np.nan)
    
    # build the graph
    G = nx.from_numpy_array(np.round(dist, 1), create_using=nx.DiGraph)
    G.remove_edges_from(nx.selfloop_edges(G))
    
    # find shortest path (NB. cycle could be False)
    path = nx.approximation.traveling_salesman_problem(G, cycle=True)
    # [1, 0, 3, 2, 1]
    
    out = [coordinates[i] for i in path[1:]]
    

    If you don't necessarily want a cycle, use cycle=False and out = [coordinates[i] for i in path].

    Output:

    [array([[ 300, 2300],
            [ 670, 2360],
            [ 400, 2300]]),
     array([[ 565, 1940],
            [ 680, 2180],
            [ 570, 1945]]),
     array([[ 980, 1965],
            [1060, 2240],
            [1100, 2250],
            [ 980, 1975]]),
     array([[1500, 1960],
            [1620, 2200],
            [1505, 1975]])]
    

    Intermediate dist:

    array([[          nan, 1248.05849222,  753.67433285,  446.01008957],
           [1151.34703717,           nan,  520.21630117,  930.12095988],
           [ 669.79474468,  525.09522946,           nan,  410.48751504],
           [ 396.01136347,  940.65137006,  416.47328846,           nan]])
    

    Graph (with distances as edge labels and the shortest path in red):

    networkx shortest path traveling salesman