rtidygraph

Count identical columns of a matrix containing lists in R?


The data I'm dealing with is a matrix, where each column in the matrix is a list with 2 elements. What I'm trying to do is count how many of the columns are identical.

I am extracting the matrix of lists from a tidygraph object. My example below should explain a bit better my issue. To begin, I create some data, turn them into tidygraph objects and put them all into a list, like so:

library(ggraph)
library(tidygraph)

# create some nodes and edges data
nodes  <- data.frame(name = c("x4", NA, NA))
nodes1 <- data.frame(name = c("x4", "x2", NA, NA, "x1", NA, NA))
nodes2 <- data.frame(name = c("x1", NA, NA))
nodes3 <- data.frame(name = c("x6", NA, NA))
nodes4 <- data.frame(name = c("x10", "x3", NA, NA, NA))
nodes5 <- data.frame(name = c("x1", "x2", NA, NA, "x7", NA, NA))

edges  <- data.frame(from = c(1,1), to = c(2,3))
edges1 <- data.frame(from = c(1, 2, 2, 1, 5, 5), to = c(2, 3, 4, 5, 6, 7))
edges2 <- data.frame(from = c(1,1), to = c(2,3))
edges3 <- data.frame(from = c(1,1), to = c(2,3))
edges4 <- data.frame(from = c(1,2,2,1), to = c(2,3,4,5))
edges5 <- data.frame(from = c(1, 2, 2, 1, 5, 5), to = c(2, 3, 4, 5, 6, 7))


# create the tbl_graphs
tg   <- tbl_graph(nodes = nodes,  edges = edges)
tg_1 <- tbl_graph(nodes = nodes1, edges = edges1)
tg_2 <- tbl_graph(nodes = nodes2, edges = edges2)
tg_3 <- tbl_graph(nodes = nodes3, edges = edges3)
tg_4 <- tbl_graph(nodes = nodes4, edges = edges4)
tg_5 <- tbl_graph(nodes = nodes5, edges = edges5)


# put into list
myList <- list(tg, tg_1, tg_2, tg_3, tg_4, tg_5)

For clarity, looking at one of the elements of myList looks like:

myList[1]
[[1]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 1 (active)
  name 
  <chr>
1 x4   
2 NA   
3 NA   
#
# Edge Data: 2 × 2
   from    to
  <int> <int>
1     1     2
2     1     3

Essentially, what I want to do is go through each element of the list, look at the edges data and see how many are identical. I'm sure there are multiple ways to do this.... but I tried doing this by using a tidygraph function to extract just the edges data and gives back the matrix of lists:

# extract just the edges data
resEdges <- sapply(myList, function(x) {
  nodes <- tidygraph::activate(x, edges) %>% 
    tibble::as_tibble()
})

Again, for clarity, looking at the 1st column in resEdges looks like this:

> resEdges[,1]
$from
[1] 1 1

$to
[1] 2 3

So, what Im trying to do, is go through resEdges's columns and count the frequency of identical columns.

In my example, there are only 3 unique columns. So, my desired output would look something like this:

> edgeFreq
# A tibble: 3 × 3
  from             to             frequency
  1 1              2 3            3
  1 2 2 1 5 5      2 3 4 5 6 7    2
  1 2 2 1          2 3 4 5        1

Solution

  • myList %>%
      map_chr(~as_tibble(activate(.x, edges))%>%
            map_chr(str_c, collapse = " ") %>%
            toString())%>%
      table()%>%
      as_tibble() %>%
      setNames(c("data", "frequency")) %>%
      separate(data, c("From", "to"), ", ")
      
    # A tibble: 3 x 3
      From        to          frequency
      <chr>       <chr>           <int>
    1 1 1         2 3                 3
    2 1 2 2 1 5 5 2 3 4 5 6 7         2
    3 1 2 2 1     2 3 4 5             1