rtidygraph

Return list index from a list of tidygraph objects in R?


So, I have a list of multiple tidygraph objects and what Im trying to do is return the index a specific tidygraph object, selected by the user. Hopefully my example below will explain the problem.

(ASIDE: I have attempted a solution that I show below, but at the moment it is super slow to run. Im hoping to come up with a different, faster solution.)

To begin, I create some data to turn into tidygraph objects, then I create the tidygraph objects and put them all together into a list:

library(tidygraph)

# create some data for the tbl_graph
nodes <- data.frame(name = c("Hadley", "David", "Romain", "Julia"),
                    level = c(1,1,1,1),
                    rank = c(1,1,1,1))

nodes1 <- data.frame(name = c("Hadley", "David", "Romain", "Julia"),
                    level = c(1,1,1,1),
                    rank = c(2,2,2,2))

nodes2 <- data.frame(name = c("Hadley", "David", "Romain", "Julia"),
                    level = c(1,1,1,1),
                    rank = c(3,3,3,3))

nodes3 <- data.frame(name = c("Hadley", "David", "Romain", "Julia"),
                    level = c(2,2,2,2),
                    rank = c(1,1,1,1))


edges <- data.frame(from = c(1, 1, 1, 2, 3, 3, 4, 4, 4),
                    to = c(2, 3, 4, 1, 1, 2, 1, 2, 3))


# create the tbl_graphs
tg <- tbl_graph(nodes = nodes, edges = edges)
tg_1 <- tbl_graph(nodes = nodes1, edges = edges)
tg_2 <- tbl_graph(nodes = nodes2, edges = edges)
tg_3 <- tbl_graph(nodes = nodes3, edges = edges)

# put into list
myList <- list(tg, tg_1, tg_2, tg_3)

For clarity, looking at the 1st list element looks like this:

> myList[1]
[[1]]
# A tbl_graph: 4 nodes and 9 edges
#
# A directed simple graph with 1 component
#
# Node Data: 4 × 3 (active)
  name   level  rank
  <chr>  <dbl> <dbl>
1 Hadley     1     1
2 David      1     1
3 Romain     1     1
4 Julia      1     1
#
# Edge Data: 9 × 2
   from    to
  <int> <int>
1     1     2
2     1     3
3     1     4
# … with 6 more rows

We can see that each object has a variable called level and another called rank. What Im trying to do is return the list index of an object by selecting the level and rank number. So, for example, if I select level = 1 and rank = 2, my function would return the index of the object with those values (in this case the 2nd list element). My attempted solution to this is below, but it's a very slow process... I was wondering if there is a better way to achieve what I want?

My Attempted Solution

In my solution, I begin by turning each of the tidygraph objects in a tibble to make them easier to manipulate. And this is why my function is so slow. In my data, I could have up to 200,000 tidygraph objects in a list, so going through them and converting them all to tibbles is a very slow process. I do that like so:

# seperating out the list to make it easier to manipulate
list_obj <- lapply(myList, function(x){
  edges <- tidygraph::activate(x, edges) %>% tibble::as_tibble()
  nodes <- tidygraph::activate(x, nodes) %>% tibble::as_tibble()
  return(list(edges = edges, nodes = nodes))
} )

And then this is the function I actually use to extract the index of the chosen object:

# this function returns the tree index asked for by user
getTreeListNumber <- function(listObj, level, rank){
  
  res <- 0
  listNumber <- NA
  
  for(i in 1:length(listObj)){
    res <- level %in% listObj[[i]]$nodes$level && rank %in% listObj[[i]]$nodes$rank
    if(res == TRUE){
      listNumber <- i
    }
  }
  return(listNumber)
}

For example:

> getTreeListNumber(list_obj, level = 1, rank = 2)
[1] 2

By selecting the level and rank, the function return the objects index within the list. But is there a faster way to achieve this result?


Solution

  • You may try -

    getTreeListNumber <- function(listObj, level, rank){
      which(sapply(myList, function(x) {
        nodes <- tidygraph::activate(x, nodes) %>% tibble::as_tibble() 
        all(nodes$level == level & nodes$rank == rank)
      }))
    }
    
    getTreeListNumber(myList, 1, 2)
    #[1] 2