I have created a network where people are connected to specific events (participants to event 1 connected to node "event1").
I would like to know if it is possible to remove the node "event" and directly link together the people that took part in that event.
I have done something similar in the past using Excel to work on the raw data. I would like to know if there is a faster and better way to do it without leaving R.
The dataset looks like this:
net1
from to
Person 1 Event1
Person 2 Event1
Person 3 Event2
Person 4 Event2
Person 5 Event2
Person 6 Event3
...
As an example, I would like to delete "Event1" and connect Person1 and Person2 together as I am doing it.
I am sorry if I cannot provide a better code to work with. But I don't know where to start from to operate in this way.
We can do this using tidyverse
.
group_split
with the keep = FALSE
argument splits the dataframe by the to
column into a list of dataframes, while removing the grouping variable in the output.
map_dfr
expands each dataframe by finding all combinations of from
and itself (like expand.grid
). _dfr
means that the output list will be row-bind to form a dataframe.
pmap_dfr
operates on each row of the dataframe, and sorts horizontally (sort(c(...))
). set_names
is needed to line up the columns after sorting. tibble(!!!
splices the sorted vector into a row of the tibble
, effectively turning it into a row vector.
filter
and distinct
removes self-loops and duplicate links respectively.
Note that both group_split
and group_map
are currently experimental functions. Please use with caution.
library(tidyverse)
net1 %>%
group_by(to) %>%
group_split(keep = FALSE) %>%
map_dfr(expand, crossing(from, to = from)) %>%
pmap_dfr(~ tibble(!!!sort(c(...)) %>% set_names(c("from", "to")))) %>%
filter(from != to) %>%
distinct()
or with group_map
:
net1 %>%
group_by(temp = to) %>%
group_map(~ expand(.x, crossing(from, to = from))) %>%
ungroup() %>%
select(-temp) %>%
pmap_dfr(~ tibble(!!!sort(c(...)) %>% set_names(c("from", "to")))) %>%
filter(from != to) %>%
distinct()
or with inner_join
:
net1 %>%
inner_join(net1, by = "to") %>%
select(from = from.x, to = from.y) %>%
pmap_dfr(~ tibble(!!!sort(c(...)) %>% set_names(c("from", "to")))) %>%
filter(from != to) %>%
distinct()
We can also use graph_from_data_frame
in place of pmap_dfr
to return an undirected graph (be sure to load igraph
before loading tidyverse
, otherwise, you might get some unexpected errors):
library(igraph)
library(tidyverse)
net1 %>%
inner_join(net1, by = "to") %>%
select(from = from.x, to = from.y) %>%
igraph::graph_from_data_frame(directed = FALSE) %>%
igraph::as_data_frame(what = "edges") %>%
filter(from != to) %>%
distinct()
Output:
# A tibble: 4 x 2
from to
<chr> <chr>
1 Person_1 Person_2
2 Person_3 Person_4
3 Person_3 Person_5
4 Person_4 Person_5
Data:
net1 <- structure(list(from = c("Person_1", "Person_2", "Person_3", "Person_4",
"Person_5", "Person_6"), to = c("Event1", "Event1", "Event2",
"Event2", "Event2", "Event3")), class = "data.frame", row.names = c(NA,
-6L))