rnetwork-programmingchildrenancestor

Function in R which returns ancestors and children in a network


I would like to create a function "f" in R which has in entry a data.frame of edges between individuals and an individual (called A2 for instance) and which returns another data.frame with only "ancestors" and "children" of A2 and also ancestors of ancestors and children of children !

To illustrate my complicated issue :

 library(visNetwork)
 nodes <- data.frame(id = c(paste0("A",1:5),paste0("B",1:3)),
                label = c(paste0("A",1:5),paste0("B",1:3)))
 edges <- data.frame(from = c("A1","A1","A2","A3","A4","B1","B2"),
                to = c("A2","A3","A4","A4","A5","B3","B3"))
 visNetwork(nodes, edges) %>% 
   visNodes(font = list(size=45)) %>% 
    visHierarchicalLayout(direction = "LR", levelSeparation = 500)

enter image description here

In this example, the data.frame contains 2 different independant networks : 1 network with "A"s and another with "B"s.

I would like to implement a function f(data=edges, indiv="A2") which returns a data.frame which contains all lines of data.frame edges concerned with the network of "A"s :

f(edges,"A2") would return this extract of data.frame edges

 head(f(edges,"A2"))
 #  from to
 #1   A1 A2
 #2   A1 A3
 #3   A2 A4
 #4   A3 A4
 #5   A4 A5

I hope it is clear enough for you to help me.

Thanks a lot !


Solution

  • I've written a simple algorithm to find all the family linked to an individual (and I’m sure it can be improved). Like @romles suggested you can do the same thing with some R packages like igraph. However, in this case, my function seems a bit more performant that the igraph option.

    edges <- data.frame(from = c("A1","A1","A2","A3","A4","B1","B2"),
                        to = c("A2","A3","A4","A4","A5","B3","B3"),
                        stringsAsFactors = FALSE)
    f <- function(data, indiv){
        children_ancestors <- function(indiv){
            # Find children and ancestors of an indiv
            c(data[data[,"from"]==indiv,"to"],data[data[,"to"]==indiv,"from"])
        }
        family <- indiv
        new_people <- children_ancestors(indiv) # New people to inspect
        while(length(diff_new_p <- setdiff(new_people,family)) > 0){
            # if the new people aren't yet in the family :
            family <- c(family, diff_new_p)
            new_people <- unlist(sapply(diff_new_p, children_ancestors))
            new_people <- unique(new_people)
        }
        data[(data[,1] %in% family) | (data[,2] %in% family),]
    }
    

    f(edges, "A2") gives the expected result. Comparing to the igraph function:

    library(igraph)
    library(microbenchmark)
    edges2 <- graph_from_data_frame(edges, directed = FALSE)
    microbenchmark(simple_function = f(edges,"A2"),
                   igraph_option = as_data_frame(subgraph.edges(edges2, subcomponent(edges2, 'A2', 'in')))
                   )
    #Unit: microseconds
    #            expr      min       lq     mean   median       uq      max neval
    # simple_function  874.411  968.323 1206.037 1123.515 1325.075 2957.931   100
    #   igraph_option 1239.896 1451.364 1802.341 1721.227 1984.380 3907.089   100