rigraphshortest-pathr-sfsfnetwork

Create subgraph from shortest path calculation sfnetworks R


I have a network, and a focal node within the network. I have calculated the shortest paths to all other nodes from that node, and I'm trying to figure out how to make a sub-graph of the original network from that shortest path calculation.

The goal is to be able to show the all the paths from my focal node to any other node that match a criteria (e.g. that the path is less than 50m or longer than 100m but shorter than 500m).

library(sfnetworks)
library(sf)
library(tidygraph)
library(dplyr)
library(ggplot2)

# the base network
net <- as_sfnetwork(roxel, directed = FALSE) %>%
  st_transform(3035) %>%
  activate("edges") %>%
  mutate(weight = edge_length()) 

# the network with the focal point
ggplot() +
  geom_sf(data = net %>%
            activate("edges") %>%  
            st_as_sf(), colour = "grey90") +
  geom_sf(data = net %>%
            activate("nodes") %>%  
            st_as_sf())  + 
    geom_sf(data = net %>%
            activate("nodes") %>%  
            slice(495) %>% 
            st_as_sf(), size = 3.5, fill = "orange", 
          colour = "black", shape = 21) +
  theme_void()

enter image description here

I can get the shortest paths using the sfnetworks wrapper or directly from igraph:

short_paths_sf <- st_network_paths(net %>% activate(nodes), 
                          from = 495, to = c(1:701), weights = "weight")
short_paths_ig <- igraph::shortest_paths(
  graph = net, 
  from = 495,
  to = c(1:701),
  output = "both",
  weights = net %>% activate(edges) %>% pull(weight)
)

But I think I'm missing two key parts here. 1) Making the subgraph from the net object and containing just the shortest paths and 2) then transforming the subgraph so I can do operations on the length of the whole path not just the weight of an edge.

I know I can get the length of a single path between two points with something like

single_path <- st_network_paths(net %>% activate(nodes),
                 from = 495,
                 to = 701,
                 weights = "weight") %>% 
  pull(edge_paths) %>% 
  unlist()

net %>% 
  activate(edges) %>%
  slice(single_path) %>%
  st_as_sf() %>%
  st_combine() %>%
  st_length()
> 482.0495 [m]

Which I can plot with

ggplot() +
  geom_sf(data = net %>%
            activate("edges") %>%  
            st_as_sf(), colour = "grey90") +
  geom_sf(data = net %>%
            activate("nodes") %>%  
            st_as_sf())  + 
    geom_sf(data = net %>%
            activate("nodes") %>%  
            slice(495) %>% 
            st_as_sf(), size = 3.5, fill = "orange", 
          colour = "black", shape = 21) +
  geom_sf(data = net %>%
            activate(edges) %>%
            slice(single_path) %>% 
            st_as_sf(), colour = "firebrick") +
  theme_void()

enter image description here

But I'm struggling to figure out how to do that for the single-to-many case, and then especially, how to filter those paths such that, for example, I could look at only paths from point 495 that are shorter than 500m.

I tried to apply what the authors here suggest in the shortest paths section:

sub_graph <- net %>% 
  igraph::subgraph.edges(eids = short_paths_ig$epath %>% unlist()) %>% 
  as_tbl_graph()

But then it's unclear to me how this will show me my shortest paths?

Any help much appreciated!!


Solution

  • EDIT:

    I spent a while figuring out a better version of my first solution (below) so here's a much faster way (I didn't benchmark the final solution just step-wise, but it ends up being like >50x faster).

    So we start the same way exactly:

    library(sfnetworks)
    library(parallel)
    library(tidyverse)
    
    # Load the roxel dataset
    net <- as_sfnetwork(roxel, directed = FALSE) %>%
      st_transform(3035) %>%
      activate("edges") %>%
      mutate(weight = edge_length())
    

    Calculate shortest paths from focal node to all other nodes, and save the edge and node paths:

    paths = st_network_paths(net, from = 495, weights = "weight")
    
    nodes_all <- paths %>%
      pull(node_paths) 
    
    edges_all <- paths %>%
      pull(edge_paths) 
    

    Now, the longest part of this is actually getting the length of each of the paths to check against our criteria, so I put this process into a function and parallelized it. First the function:

    slice_fun <- function(net, temp_edges) {
      #' Find path length of one path
      #' 
      #' @description Taking a single path (i.e. set of edges), get the length 
      #' of that path, and remove the units on it
      #' @param net sfnetwork. A network of sfnetwork
      #' @param temp_edges vector. The edges of the path at hand 
      #' 
      #' @return numeric value (units removed) of the length of that path
      
      return(units::drop_units(net %>% 
        activate("edges") %>% 
        slice(temp_edges) %>% 
        st_as_sf() %>% 
        st_combine() %>% 
        st_length()))
    }
    

    Now parllelize with parSapply() (this is the biggest time saver since the calculation of the length with all the sf::/sfnetworks:: functions take the longest):

    start_time <- Sys.time()
    # start the cluster
    cl <- parallel::makeCluster(8)
    
    # the libraries need to be evaluated so they're available on all workers
    parallel::clusterEvalQ(cl, {library(dplyr); library(sfnetworks); 
      library(magrittr); library(sf)})
    
    # export the objects you need for the process
    parallel::clusterExport(cl, varlist = c("edges_all", "net"))
    
    # run in parallel
    result <- parSapply(cl, edges_all, slice_fun, net = net)
    
    parallel::stopCluster(cl)
    end_time <- Sys.time()
    end_time - start_time
    # > Time difference of 14.49258 secs
    

    Now, find the indices of the edges_all that pass our criteria (here, we only want paths longer than 1000m), and filter the edges by those indices, unlist() so all the edges are together, and keep only unique edges:

    indices_keep <- which(result > 1000)
    keep_edges <- unique(edges_all[indices_keep] %>% unlist())
    

    If we want the nodes at the ends of the the paths that fulfill our requirement, we can do so like this:

    # we can get the nodes if we want too
    nodes_to_keep_all <- nodes_all[indices_keep]
    nodes_to_keep <- unlist(lapply(nodes_to_keep_all, tail, n = 1L) %>% unlist())
    

    Now plot:

    ggplot() + 
      geom_sf(data = net %>%
                activate("edges") %>%  
                st_as_sf(), colour = "grey90") +
      geom_sf(data = net %>%
                activate("nodes") %>%  
                st_as_sf(), colour = "grey90")  + 
      geom_sf(data = net %>%
                activate("edges") %>%
                slice(keep_edges) %>% 
                st_as_sf()
      ) +
      geom_sf(data = net %>%
                activate("nodes") %>%  
                slice(495) %>% 
                st_as_sf(), size = 3.5, fill = "orange", colour = "black", shape = 21) +
      geom_sf(data = net %>%
                activate("nodes") %>%
                slice(nodes_to_keep) %>%
                st_as_sf(), size = 2, fill = "red", colour = "black", shape = 21,
              alpha = 0.4) +
      theme_void()
    

    enter image description here


    Original solution:

    Ok so I figured out a custom solution to this, where the criteria is if the path is > 1000m.

    We can start by getting our network:

    net = as_sfnetwork(roxel, directed = FALSE) %>%
      st_transform(3035) %>%
      activate("edges") %>%
      mutate(weight = edge_length())
    
    

    Then we calculate the shortest paths from a focal node to all the other nodes, and we save the edge_paths and node_paths from that run to objects for easy access.

    paths = st_network_paths(net, from = 495, weights = "weight")
    
    # get nodes & edges both as paths (this results in a list format)
    nodes_all <- paths %>%
      pull(node_paths) 
    
    edges_all <- paths %>%
      pull(edge_paths) 
    
    

    Now I wrote a custom function that I have not optimized, it's not very fast at all, but it's a verbosely explicit version of what I'm trying to do. The function takes a particular path, figures out if that path is long enough for my criteria, and if so, keeps the edges involved in that path. There's also an option to keep or leave out the nodes that are at the end of those long-enough paths.

    The function:

    len_crit <- function(net, edges, nodes = NULL) {
      #' Look at whether or not each of the individual paths calcualted actually
      #' pass the required test
      #' 
      #' @description  Look at whether or not each of the individual paths 
      #' calculated actually pass the required test
      #' @param net sfnetwork. A network of sfnetwork
      #' @param slice_val integer. The value to slice the edges into 
      #' @param edges list. The list of the edges in the shortest path
      
      # initialize empty vector
      all_edges <- as.numeric()
      # initialize empty vector for nodes if applicable 
      if(!is.null(nodes)) {
        all_nodes <- as.numeric()
      }
      
      # go through each of the slices (aka each of the paths)
      for(slice in 1:length(edges)) {
        
        # check what the temporary path length is
        temp_len <- net %>% 
          activate("edges") %>% 
          slice(edges[[slice]]) %>% 
          st_as_sf() %>% 
          st_combine() %>% 
          st_length()
        
        # if the temporary length is long enough, add the edges of that path to 
        # the total edges
        if(temp_len > units::set_units(1000, m)) {
          # if the length of the current path is long enough, add it to all_edges
          all_edges <- c(all_edges, edges[[slice]])
          
          # if we also want to plot the nodes, we can do so
          if(!is.null(nodes)) {
            # if the length is long enough, keep the LAST node in that set
            all_nodes <- c(all_nodes, nodes[[slice]][[length(nodes[[slice]])]]) 
          }
        }
      }
      
      # if the nodes are selected return both that and the edges
      if(!is.null(nodes)) {
        # keep only the unique ones
        unique_nodes <- unique(all_nodes)
        # keep only the unique ones
        unique_edges <- unique(all_edges)
        # list up both
        nodes_edges <- list(
          nodes = unique_nodes,
          edges = unique_edges 
        )
        # return both
        return(nodes_edges)
      }
    
      # keep only the unique ones
      unique_edges <- unique(all_edges)
      
      return(unique_edges)
    }
    
    

    Now I can run it:

    short_edges <- len_crit(net = net, edges = edges_all, nodes = nodes_all)
    

    And plot it by slicing the net, with activate of the nodes and edges respectively, and sliced by the short_edges$nodes/edges:

      ggplot() + 
        geom_sf(data = net %>%
                  activate("edges") %>%  
                  st_as_sf(), colour = "grey90") +
        geom_sf(data = net %>%
                  activate("nodes") %>%  
                  st_as_sf(), colour = "grey90")  + 
        geom_sf(data = net %>%
                     activate("edges") %>%
                     slice(short_edges$edges) %>% 
                     st_as_sf()
                     ) +
        geom_sf(data = net %>%
                  activate("nodes") %>%  
                  slice(495) %>% 
                  st_as_sf(), size = 3.5, fill = "orange", colour = "black", shape = 21) +
        geom_sf(data = net %>%
                  activate("nodes") %>%  
                  slice(short_edges$nodes) %>% 
                  st_as_sf(), size = 2, fill = "red", colour = "black", shape = 21,
                alpha = 0.4) +
        theme_void()
    

    Which gives this, exactly what I want :)

    enter image description here