I calculate a nested list of all possible paths of a given length in a graph
and would like to select only certain paths´s based on first and last element value of the path.
Here is what I do:
First I create a graph
library(sfnetworks)
net = as_sfnetwork(roxel, directed = FALSE) %>%
st_transform(3035) %>%
activate("edges") %>%
mutate(weight = edge_length())
Then I get all distances in this graph and select ony certain ones
with a distance
greater than 2500 and smaller than 2500*1.5
path_distance<-2500
dd = distances(net)
dd[dd == Inf] <- 0
ia = which(dd>=path_distance & dd<path_distance*1.5,arr.ind = TRUE)
a<-(ia[,1])
Now I get all the paths´s with those constrains in this graph:
get_path<-function(net,ia,dd){
v.from = V(net)[ia %% ncol(dd)]
v.to = V(net)[ceiling(ia / ncol(dd))]
shortest_paths(net, from = v.from, to = v.to)$vpath
}
possible_paths<-lapply(a, function (x) get_path(net = net,ia = x,dd = dd))
Result is a nested list of possible paths.
I would like now to select only those paths which are starting with a node 495 and end with 458 if there are such.
The function igraph::distances
gives the shortest distance between all pairs of vertices, so there is only a single path examined here, and its length doesn't meet your criteria:
distances(net)[495, 458]
#> [1] 1614.252
distances(net)[458, 495]
#> [1] 1614.252
If I understand you correctly, you want to find paths between node 495 and 458 that are between 2500 and 3750. There are an awful lot of these. To find all the simple paths between two nodes, we can use all_simple_paths
. However, we would quickly run out of memory if we don't limit the number of nodes allowed for any single path. For example, if we do:
sapply(1:25, function(i) length(all_simple_paths(net, 495, 458, cutoff = i)))
#> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> [20] 1 5 14 31 66 138 286
we can see that there are no paths between 495 and 458 that are shorter than 20 nodes long. There is a single path with 20 nodes, 5 possible paths of 21 nodes or less and so on up to 286 paths of length 25 nodes or less. This number grows exponentially, so for example by the time we consider all paths of 30 nodes or fewer, there are 7038 possible paths.
length(all_simple_paths(net, 495, 458, cutoff = 30))
#> [1] 7038
If you increase this to just 32 nodes, we have over 22,000 possible paths, and my machine hangs on 35 nodes. It's therefore not feasible to get all simple paths in a network of this size. We can certainly get lots of paths that meet your criteria.
The following function allows you to get all paths of a particular length between two nodes, given a maximum number of nodes in the path:
paths_of_length <- function(net, start, end, min, max, max_nodes = 12) {
possible_paths <- all_simple_paths(net, start, end, cutoff = max_nodes) %>%
lapply( function(x) suppressMessages(
as_sfnetwork(induced_subgraph(net, x))))
path_lengths <- sapply(possible_paths, function(x) max(distances(x)))
return(possible_paths[path_lengths > min & path_lengths < max])
}
For example, let's find all paths between node 495 and 458 that are longer than 2500 and shorter than 3750, but only considering paths with 23 nodes or fewer:
p <- paths_of_length(net, 495, 458, 2500, 2500 * 1.5, max_nodes = 23)
We can see these are all within range:
sapply(p, function(x) max(distances(x)))
#> [1] 2532.525 2592.271 2514.816 2539.518 2545.331 2514.816 2521.846
and we can even plot them to see what they look like:
par(mfrow = c(3, 3))
for(i in seq_along(p)) {
plot(net)
plot(p[[i]], col = "red", add = TRUE)
}
However, with a network of this size, you will not be able to realistically get all paths within the given distance.
EDIT
To get a bit more variation between paths, increase the number of nodes and sample from them. For example, to get 9 different paths with reasonable variation, you can try:
p <- paths_of_length(net, 495, 458, 2500, 2500 * 1.5, max_nodes = 26)
p <- sample(p, 9)
par(mfrow = c(3, 3))
for(i in seq_along(p)) {
plot(net)
plot(p[[i]], col = "red", add = TRUE)
}