I have six data frames, some of which contain the same structure and values but some are not.
I have compared all combinations of those data frames using identical()
in loop function and summarized comparison result in the following sheet.
result <- data.frame(source=c(1,1,1,1,1,2,2,2,2,3,3,3,4,4,5), dest=c(2,3,4,5,6,3,4,5,6,4,5,6,5,6,6), TF=c("T","F","F","F","F","T","F","F","F","F","F","F","T","F","F"))
> result
source dest TF
1 1 2 T
2 1 3 F
3 1 4 F
4 1 5 F
5 1 6 F
6 2 3 T
7 2 4 F
8 2 5 F
9 2 6 F
10 3 4 F
11 3 5 F
12 3 6 F
13 4 5 T
14 4 6 F
15 5 6 F
sourece
and dest
are the combination list of five data frames. TF
stores whether those two data frames are the same or not.
For instance, data frame 1 and 2 are the same, then 2 and 3 are the same as well. Thus those data frame 1, 2 and 3 are the same and then unique group id will be given. Next, data frame 4 and 5 are the same data frame. Thus those data frames will have another group id. Data frame 6 does not have any identical data frame, thus this will have another group id. This will return the following table.
> group_id
dfID, GroupID
1 1
2 1
3 1
4 2
5 2
6 3
Is there any ideas to conver the result
table into group_id
?
The challenge is how to define connections/chain between data frames with T which shares the same data frame number in source
or dest
so that we can identify group. Once identified the group, we can simply give sequential numbers from first to the last group. Another challenge is that there may be several unique data frames which are not identical with others. We need to add unique group id for them as well.
You can use subgraph.edges
and components
along with membership
to make it
library(igraph)
result %>%
graph_from_data_frame() %>%
subgraph.edges(which(E(.)$TF == "T"), delete.vertices = FALSE) %>%
components() %>%
membership() %>%
stack() %>%
rev() %>%
setNames(c("dfID", "GroupID")) %>%
type.convert(as.is = TRUE)
which gives output like
dfID GroupID
1 1 1
2 2 1
3 3 1
4 4 2
5 5 2
6 6 3