rnodesedgesnetwork-analysis

How to get an edge list (undirected graph) from a grouped dataframe in R?


I have created a network where people are connected to specific events (participants to event 1 connected to node "event1").

I would like to know if it is possible to remove the node "event" and directly link together the people that took part in that event.

I have done something similar in the past using Excel to work on the raw data. I would like to know if there is a faster and better way to do it without leaving R.

The dataset looks like this:

net1
from        to 
Person 1   Event1 
Person 2   Event1 
Person 3   Event2
Person 4   Event2 
Person 5   Event2 
Person 6   Event3
...

As an example, I would like to delete "Event1" and connect Person1 and Person2 together as I am doing it.

I am sorry if I cannot provide a better code to work with. But I don't know where to start from to operate in this way.


Solution

  • We can do this using tidyverse.

    1. group_split with the keep = FALSE argument splits the dataframe by the to column into a list of dataframes, while removing the grouping variable in the output.

    2. map_dfr expands each dataframe by finding all combinations of from and itself (like expand.grid). _dfr means that the output list will be row-bind to form a dataframe.

    3. pmap_dfr operates on each row of the dataframe, and sorts horizontally (sort(c(...))). set_names is needed to line up the columns after sorting. tibble(!!! splices the sorted vector into a row of the tibble, effectively turning it into a row vector.

    4. filter and distinct removes self-loops and duplicate links respectively.

    Note that both group_split and group_map are currently experimental functions. Please use with caution.

    library(tidyverse)
    
    net1 %>%
      group_by(to) %>%
      group_split(keep = FALSE) %>%
      map_dfr(expand, crossing(from, to = from)) %>%
      pmap_dfr(~ tibble(!!!sort(c(...)) %>% set_names(c("from", "to")))) %>%
      filter(from != to) %>%
      distinct()
    

    or with group_map:

    net1 %>%
      group_by(temp = to) %>%
      group_map(~ expand(.x, crossing(from, to = from))) %>%
      ungroup() %>%
      select(-temp) %>%
      pmap_dfr(~ tibble(!!!sort(c(...)) %>% set_names(c("from", "to")))) %>%
      filter(from != to) %>%
      distinct()
    

    or with inner_join:

    net1 %>%
      inner_join(net1, by = "to") %>%
      select(from = from.x, to = from.y) %>%
      pmap_dfr(~ tibble(!!!sort(c(...)) %>% set_names(c("from", "to")))) %>%
      filter(from != to) %>%
      distinct()
    

    We can also use graph_from_data_frame in place of pmap_dfr to return an undirected graph (be sure to load igraph before loading tidyverse, otherwise, you might get some unexpected errors):

    library(igraph)
    library(tidyverse)
    
    net1 %>%
      inner_join(net1, by = "to") %>%
      select(from = from.x, to = from.y) %>%
      igraph::graph_from_data_frame(directed = FALSE) %>%
      igraph::as_data_frame(what = "edges") %>%
      filter(from != to) %>%
      distinct()
    

    Output:

    # A tibble: 4 x 2
      from     to      
      <chr>    <chr>   
    1 Person_1 Person_2
    2 Person_3 Person_4
    3 Person_3 Person_5
    4 Person_4 Person_5
    

    Data:

    net1 <- structure(list(from = c("Person_1", "Person_2", "Person_3", "Person_4", 
    "Person_5", "Person_6"), to = c("Event1", "Event1", "Event2", 
    "Event2", "Event2", "Event3")), class = "data.frame", row.names = c(NA, 
    -6L))