I would like to create a function "f" in R which has in entry a data.frame of edges between individuals and an individual (called A2 for instance) and which returns another data.frame with only "ancestors" and "children" of A2 and also ancestors of ancestors and children of children !
To illustrate my complicated issue :
library(visNetwork)
nodes <- data.frame(id = c(paste0("A",1:5),paste0("B",1:3)),
label = c(paste0("A",1:5),paste0("B",1:3)))
edges <- data.frame(from = c("A1","A1","A2","A3","A4","B1","B2"),
to = c("A2","A3","A4","A4","A5","B3","B3"))
visNetwork(nodes, edges) %>%
visNodes(font = list(size=45)) %>%
visHierarchicalLayout(direction = "LR", levelSeparation = 500)
In this example, the data.frame contains 2 different independant networks : 1 network with "A"s and another with "B"s.
I would like to implement a function f(data=edges, indiv="A2") which returns a data.frame which contains all lines of data.frame edges concerned with the network of "A"s :
f(edges,"A2") would return this extract of data.frame edges
head(f(edges,"A2"))
# from to
#1 A1 A2
#2 A1 A3
#3 A2 A4
#4 A3 A4
#5 A4 A5
I hope it is clear enough for you to help me.
Thanks a lot !
I've written a simple algorithm to find all the family linked to an individual (and I’m sure it can be improved). Like @romles suggested you can do the same thing with some R packages like igraph. However, in this case, my function seems a bit more performant that the igraph option.
edges <- data.frame(from = c("A1","A1","A2","A3","A4","B1","B2"),
to = c("A2","A3","A4","A4","A5","B3","B3"),
stringsAsFactors = FALSE)
f <- function(data, indiv){
children_ancestors <- function(indiv){
# Find children and ancestors of an indiv
c(data[data[,"from"]==indiv,"to"],data[data[,"to"]==indiv,"from"])
}
family <- indiv
new_people <- children_ancestors(indiv) # New people to inspect
while(length(diff_new_p <- setdiff(new_people,family)) > 0){
# if the new people aren't yet in the family :
family <- c(family, diff_new_p)
new_people <- unlist(sapply(diff_new_p, children_ancestors))
new_people <- unique(new_people)
}
data[(data[,1] %in% family) | (data[,2] %in% family),]
}
f(edges, "A2")
gives the expected result. Comparing to the igraph function:
library(igraph)
library(microbenchmark)
edges2 <- graph_from_data_frame(edges, directed = FALSE)
microbenchmark(simple_function = f(edges,"A2"),
igraph_option = as_data_frame(subgraph.edges(edges2, subcomponent(edges2, 'A2', 'in')))
)
#Unit: microseconds
# expr min lq mean median uq max neval
# simple_function 874.411 968.323 1206.037 1123.515 1325.075 2957.931 100
# igraph_option 1239.896 1451.364 1802.341 1721.227 1984.380 3907.089 100