I have a some tidygraph objects contained in a list. I am trying to count the frequency of columns (within the tidygraph nodes data) that are identical.
For example,
if I create some nodes and edge data, turn them into tidygraph objects, and put them in a list, like so:
library(tidygraph)
# create some node and edge data for the tbl_graph
nodes <- data.frame(name = c("x4", NA, NA),
val = c(1, 5, 2))
nodes2 <- data.frame(name = c("x4", NA, NA),
val = c(3, 2, 2))
nodes3 <- data.frame(name = c("x4", NA, NA),
val = c(5, 6, 7))
nodes4 <- data.frame(name = c("x4", "x2", NA, NA, "x1", NA, NA),
val = c(3, 2, 2, 1, 1, 2, 7))
nodes5 <- data.frame(name= c("x1", "x2", NA),
val = c(7, 4, 2))
nodes6 <- data.frame(name = c("x1", "x2", NA),
val = c(2, 1, 3))
edges <- data.frame(from = c(1,1), to = c(2,3))
edges1 <- data.frame(from = c(1, 2, 2, 1, 5, 5),
to = c(2, 3, 4, 5, 6, 7))
# create the tbl_graphs
tg <- tbl_graph(nodes = nodes, edges = edges)
tg_1 <- tbl_graph(nodes = nodes2, edges = edges)
tg_2 <- tbl_graph(nodes = nodes2, edges = edges)
tg_3 <- tbl_graph(nodes = nodes4, edges = edges1)
tg_4 <- tbl_graph(nodes = nodes5, edges = edges)
tg_5 <- tbl_graph(nodes = nodes6, edges = edges)
# put into list
myList <- list(tg, tg_1, tg_2, tg_3, tg_4, tg_5)
We can see that tg
, tg_1
, and tg_2
all have identical name
columns. Similarly, tg_4
and tg_5
have identical name
columns in the node data.
I'm trying to come up with a way to count the frequency of tidygraph objects that have identical name
columns. I would like to be able to return a list of the tidygraph objects with maybe another column added that displays the frequency.
In my case, the val
column isn't important, so my desired output would look something like this:
[[1]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
name frequency
<chr> <dbl>
1 x4 3
2 NA 3
3 NA 3
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
[[2]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
name frequency
<chr> <dbl>
1 x1 2
2 x2 2
3 NA 2
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
[[3]]
# A tbl_graph: 7 nodes and 6 edges
#
# A rooted tree
#
# Node Data: 7 × 2 (active)
name frequency
<chr> <dbl>
1 x4 1
2 x2 1
3 NA 1
4 NA 1
5 x1 1
6 NA 1
# … with 1 more row
#
# Edge Data: 6 × 2
from to
<int> <int>
1 1 2
2 2 3
3 2 4
# … with 3 more rows
To be clear, in my above example, the name
column containing x4, NA, NA
appears 3 times in my original list of objects. Hence the frequency count of 3. Similarly, the name
column equal to x1, x2, NA
appears 2 times in myList
, so it gets a frequency of 2... etc.
However, Im open to any clever suggestions as to the best way to return the frequency information.
Since tidygraph
plays nicely with tidyverse
we can use dplyr
syntax directly to manipulate elements. To make the frequencies (probably not the right term for this), or series of decrementing occurrences, group_by()
followed by a n()
can be used. We can then rely on vector recycling to assign a value to a column of a list element, depending on its index .y
.
freqs <- lapply(myList, function(x){
x %>%
pull(name) %>%
replace_na("..") %>%
paste0(collapse = "")
}) %>%
unlist(use.names = F) %>%
as_tibble() %>%
group_by(value) %>%
mutate(val = n():1) %>%
pull(val)
purrr::imap(l, ~.x %>%
mutate(frequency = freqs[.y]) %>%
select(name, frequency))
[[1]]
# Node Data: 3 x 2 (active)
name frequency
1 x4 3
2 NA 3
3 NA 3
# Edge Data: 2 x 2
from to
<int> <int>
1 1 2
2 1 3
[[2]]
# Node Data: 3 x 2 (active)
name frequency
<chr> <int>
1 x4 2
2 NA 2
3 NA 2
# Edge Data: 2 x 2
from to
<int> <int>
1 1 2
2 1 3
[[3]]
# Node Data: 3 x 2 (active)
name frequency
<chr> <int>
1 x4 1
2 NA 1
3 NA 1