rigraphsfnetwork

[R/igraph]: Select a path from a nested list of paths based on first and last node


I calculate a nested list of all possible paths of a given length in a graph
and would like to select only certain paths´s based on first and last element value of the path.

Here is what I do:

First I create a graph

library(sfnetworks)
net = as_sfnetwork(roxel, directed = FALSE) %>%
  st_transform(3035) %>%
  activate("edges") %>%
  mutate(weight = edge_length())

Then I get all distances in this graph and select ony certain ones
with a distance greater than 2500 and smaller than 2500*1.5

path_distance<-2500
dd = distances(net)
dd[dd == Inf] <- 0

ia = which(dd>=path_distance & dd<path_distance*1.5,arr.ind = TRUE)
a<-(ia[,1])

Now I get all the paths´s with those constrains in this graph:

get_path<-function(net,ia,dd){
  
  v.from = V(net)[ia %% ncol(dd)]
  v.to = V(net)[ceiling(ia / ncol(dd))]
  shortest_paths(net, from = v.from, to = v.to)$vpath
  
}

possible_paths<-lapply(a, function (x) get_path(net = net,ia = x,dd = dd))

Result is a nested list of possible paths.
I would like now to select only those paths which are starting with a node 495 and end with 458 if there are such.


Solution

  • The function igraph::distances gives the shortest distance between all pairs of vertices, so there is only a single path examined here, and its length doesn't meet your criteria:

    distances(net)[495, 458]
    #> [1] 1614.252
    
    distances(net)[458, 495]
    #> [1] 1614.252
    

    If I understand you correctly, you want to find paths between node 495 and 458 that are between 2500 and 3750. There are an awful lot of these. To find all the simple paths between two nodes, we can use all_simple_paths. However, we would quickly run out of memory if we don't limit the number of nodes allowed for any single path. For example, if we do:

    sapply(1:25, function(i) length(all_simple_paths(net, 495, 458, cutoff = i)))
    #>  [1]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   
    #> [20]   1   5  14  31  66 138 286
    

    we can see that there are no paths between 495 and 458 that are shorter than 20 nodes long. There is a single path with 20 nodes, 5 possible paths of 21 nodes or less and so on up to 286 paths of length 25 nodes or less. This number grows exponentially, so for example by the time we consider all paths of 30 nodes or fewer, there are 7038 possible paths.

    length(all_simple_paths(net, 495, 458, cutoff = 30))
    #> [1] 7038
    

    If you increase this to just 32 nodes, we have over 22,000 possible paths, and my machine hangs on 35 nodes. It's therefore not feasible to get all simple paths in a network of this size. We can certainly get lots of paths that meet your criteria.

    The following function allows you to get all paths of a particular length between two nodes, given a maximum number of nodes in the path:

    paths_of_length <- function(net, start, end, min, max, max_nodes = 12) {
      possible_paths <- all_simple_paths(net, start, end, cutoff = max_nodes) %>%
        lapply( function(x) suppressMessages(
          as_sfnetwork(induced_subgraph(net, x))))
    
      path_lengths <- sapply(possible_paths, function(x) max(distances(x)))
      return(possible_paths[path_lengths > min & path_lengths < max])
    }
    

    For example, let's find all paths between node 495 and 458 that are longer than 2500 and shorter than 3750, but only considering paths with 23 nodes or fewer:

    p <- paths_of_length(net, 495, 458, 2500, 2500 * 1.5, max_nodes = 23)
    

    We can see these are all within range:

    sapply(p, function(x) max(distances(x)))
    #> [1] 2532.525 2592.271 2514.816 2539.518 2545.331 2514.816 2521.846
    

    and we can even plot them to see what they look like:

    par(mfrow = c(3, 3))
    for(i in seq_along(p)) {
      plot(net)
      plot(p[[i]], col = "red", add = TRUE)
    }
    

    enter image description here

    However, with a network of this size, you will not be able to realistically get all paths within the given distance.


    EDIT

    To get a bit more variation between paths, increase the number of nodes and sample from them. For example, to get 9 different paths with reasonable variation, you can try:

    p <- paths_of_length(net, 495, 458, 2500, 2500 * 1.5, max_nodes = 26)
    
    p <- sample(p, 9)
    
    par(mfrow = c(3, 3))
    for(i in seq_along(p)) {
      plot(net)
      plot(p[[i]], col = "red", add = TRUE)
    }
    

    enter image description here