rtidygraph

Count the frequency of identical columns in a list of tidygraph objects in R?


I have a some tidygraph objects contained in a list. I am trying to count the frequency of columns (within the tidygraph nodes data) that are identical.

For example,

if I create some nodes and edge data, turn them into tidygraph objects, and put them in a list, like so:

library(tidygraph)

# create some node and edge data for the tbl_graph
nodes <- data.frame(name = c("x4", NA, NA),
                    val = c(1, 5, 2))
nodes2 <- data.frame(name = c("x4", NA, NA),
                    val = c(3, 2, 2))
nodes3 <- data.frame(name = c("x4", NA, NA),
                     val = c(5, 6, 7))
nodes4 <- data.frame(name = c("x4", "x2", NA, NA, "x1", NA, NA),
                     val = c(3, 2, 2, 1, 1, 2, 7))
nodes5 <- data.frame(name= c("x1", "x2", NA),
                     val = c(7, 4, 2))
nodes6 <- data.frame(name = c("x1", "x2", NA),
                     val = c(2, 1, 3))

edges <- data.frame(from = c(1,1), to = c(2,3))
edges1 <- data.frame(from = c(1, 2, 2, 1, 5, 5),
                     to    = c(2, 3, 4, 5, 6, 7))

# create the tbl_graphs
tg   <- tbl_graph(nodes = nodes,  edges = edges)
tg_1 <- tbl_graph(nodes = nodes2, edges = edges)
tg_2 <- tbl_graph(nodes = nodes2, edges = edges)
tg_3 <- tbl_graph(nodes = nodes4, edges = edges1)
tg_4 <- tbl_graph(nodes = nodes5, edges = edges)
tg_5 <- tbl_graph(nodes = nodes6, edges = edges)


# put into list
myList <- list(tg, tg_1, tg_2, tg_3, tg_4, tg_5)

We can see that tg, tg_1, and tg_2 all have identical name columns. Similarly, tg_4 and tg_5 have identical name columns in the node data.

I'm trying to come up with a way to count the frequency of tidygraph objects that have identical name columns. I would like to be able to return a list of the tidygraph objects with maybe another column added that displays the frequency. In my case, the val column isn't important, so my desired output would look something like this:

 [[1]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
  name  frequency
  <chr>     <dbl>
1 x4            3
2 NA            3
3 NA            3
#
# Edge Data: 2 × 2
   from    to
  <int> <int>
1     1     2
2     1     3

[[2]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
  name  frequency
  <chr>     <dbl>
1 x1            2
2 x2            2
3 NA            2
#
# Edge Data: 2 × 2
   from    to
  <int> <int>
1     1     2
2     1     3

[[3]]
# A tbl_graph: 7 nodes and 6 edges
#
# A rooted tree
#
# Node Data: 7 × 2 (active)
  name  frequency
  <chr>     <dbl>
1 x4            1
2 x2            1
3 NA            1
4 NA            1
5 x1            1
6 NA            1
# … with 1 more row
#
# Edge Data: 6 × 2
   from    to
  <int> <int>
1     1     2
2     2     3
3     2     4
# … with 3 more rows

To be clear, in my above example, the name column containing x4, NA, NA appears 3 times in my original list of objects. Hence the frequency count of 3. Similarly, the name column equal to x1, x2, NA appears 2 times in myList, so it gets a frequency of 2... etc.

However, Im open to any clever suggestions as to the best way to return the frequency information.


Solution

  • Since tidygraph plays nicely with tidyverse we can use dplyr syntax directly to manipulate elements. To make the frequencies (probably not the right term for this), or series of decrementing occurrences, group_by() followed by a n() can be used. We can then rely on vector recycling to assign a value to a column of a list element, depending on its index .y.

    freqs <- lapply(myList, function(x){
      x %>% 
         pull(name) %>%
         replace_na("..") %>%
         paste0(collapse = "")
    }) %>%
      unlist(use.names = F) %>%
      as_tibble() %>%
      group_by(value) %>%
      mutate(val = n():1) %>%
      pull(val)
    
    purrr::imap(l, ~.x %>% 
                  mutate(frequency = freqs[.y]) %>% 
                  select(name, frequency))
    [[1]]
    # Node Data: 3 x 2 (active)
      name  frequency
    1 x4            3
    2 NA            3
    3 NA            3
    
    # Edge Data: 2 x 2
       from    to
      <int> <int>
    1     1     2
    2     1     3
    
    [[2]]
    # Node Data: 3 x 2 (active)
      name  frequency
      <chr>     <int>
    1 x4            2
    2 NA            2
    3 NA            2
    
    # Edge Data: 2 x 2
       from    to
      <int> <int>
    1     1     2
    2     1     3
    
    [[3]]
    # Node Data: 3 x 2 (active)
      name  frequency
      <chr>     <int>
    1 x4            1
    2 NA            1
    3 NA            1