rgroupingigraphpairing

Generating group id for pairs of data frames in R


I have six data frames, some of which contain the same structure and values but some are not. I have compared all combinations of those data frames using identical() in loop function and summarized comparison result in the following sheet.

result <- data.frame(source=c(1,1,1,1,1,2,2,2,2,3,3,3,4,4,5), dest=c(2,3,4,5,6,3,4,5,6,4,5,6,5,6,6), TF=c("T","F","F","F","F","T","F","F","F","F","F","F","T","F","F"))

> result
source dest TF
1       1    2  T
2       1    3  F
3       1    4  F
4       1    5  F
5       1    6  F
6       2    3  T
7       2    4  F
8       2    5  F
9       2    6  F
10      3    4  F
11      3    5  F
12      3    6  F
13      4    5  T
14      4    6  F
15      5    6  F

sourece and dest are the combination list of five data frames. TF stores whether those two data frames are the same or not.

For instance, data frame 1 and 2 are the same, then 2 and 3 are the same as well. Thus those data frame 1, 2 and 3 are the same and then unique group id will be given. Next, data frame 4 and 5 are the same data frame. Thus those data frames will have another group id. Data frame 6 does not have any identical data frame, thus this will have another group id. This will return the following table.

> group_id
dfID, GroupID
1 1
2 1
3 1
4 2
5 2
6 3

Is there any ideas to conver the result table into group_id? The challenge is how to define connections/chain between data frames with T which shares the same data frame number in source or dest so that we can identify group. Once identified the group, we can simply give sequential numbers from first to the last group. Another challenge is that there may be several unique data frames which are not identical with others. We need to add unique group id for them as well.


Solution

  • You can use subgraph.edges and components along with membership to make it

    library(igraph)
    
    result %>%
      graph_from_data_frame() %>%
      subgraph.edges(which(E(.)$TF == "T"), delete.vertices = FALSE) %>%
      components() %>%
      membership() %>%
      stack() %>%
      rev() %>%
      setNames(c("dfID", "GroupID")) %>%
      type.convert(as.is = TRUE)
    

    which gives output like

      dfID GroupID
    1    1       1
    2    2       1
    3    3       1
    4    4       2
    5    5       2
    6    6       3