rtreeggraphtidygraph

Network/tree data: calculate number of independent trees and average maximum edges per independent tree


I would like to draw network plots using tidygraph and ggraph.

I have a larger tibble with items connected via from and to. Some of the trees are connected (a0 and b0 in the example).

I would like to:

  1. Count the number of independent trees
  2. Calculate the average maximum edges=connections per independent tree. The average maximum edges should be calculated "downstreams", i.e. from a0 to k2 or a4 not a0 to b0 in the example data.

Example:

library(tidygraph)
library(igraph)
library(ggraph)
library(tidyverse)


# make edges
edges<- tibble(from = c("a0","a1","a2","a3","b0","b1","c0","c1","a2","k1"),
               to = c("a1","a2","a3","a4","b1","a3","c1","c2","k1","k2"))


# makenodes
nodes  <- unique(c(edges$from,edges$to))
tibble(node=nodes,
       label=nodes) -> nodes


# make correct dataframe                 
routes_igraph <- graph_from_data_frame(d = edges,
                                       vertices = nodes,
                                       directed = TRUE)

routes_tidy <- as_tbl_graph(routes_igraph)

#plot network
ggraph(routes_tidy, layout = "tree") + 
  geom_edge_link() + 
  geom_node_point() + 
  theme_graph() +
  geom_node_text(aes(label = label), repel = TRUE)

Created on 2023-04-16 with reprex v2.0.2

Desired output

  1. Number of independent trees of the given edges and nodes: 2

  2. Average maximum edges per independen trees: 3.5, 2


Solution

  • Here is a way. It borrows a function height from this SO post, modified to count "in" vertices.

    height <- function(v, g) {
      D <- distances(g, to=v, mode="in")
      max(D[D != Inf])
    }
    
    cmp <- components(routes_igraph)
    sp <- split(names(cmp$membership), cmp$membership)
    sub_tree_list <- lapply(sp, \(v) induced.subgraph(routes_igraph, v))
    sub_tree_height <- Map(\(g, v) sapply(v, height, g = g), sub_tree_list, sp)
    
    # number of components
    length(sp)
    #> [1] 2
    
    # height of each sub-tree
    sapply(sub_tree_height, max)
    #> 1 2 
    #> 4 2
    

    Created on 2023-04-16 with reprex v2.0.2


    Edit

    To get the maxima per initial node and their averages per sub-tree, this works.

    initials_list <- lapply(sp, \(x) x[grep("0", x)])
    sub_tree_max_height <- Map(\(g, v) sapply(v, height, g = g), sub_tree_list, initials_list)
    sapply(sub_tree_max_height, mean)
    #>   1   2 
    #> 3.5 2.0
    

    Created on 2023-04-16 with reprex v2.0.2