dplyrsna

Group by case where col A & col B = col B & col A, and then mutate column based on within-group similarity


I'm using R to format some data for a social network diagram that I'm visualizing in another program, and I'm looking for a way to designate whether edges are asymmetric or symmetric.

I've got data that look like this:

   source target weight
1       C     D     2
2       F     E     0
3       G     H     1
4       B     A     2
5       H     G     1
6       A     B     2
7       E     F     0
8       D     C     2
9       J     I     1
10      P     O     4
11      M     N     3
12      K     L     0
13      N     M     4
14      O     P     1
15      I     J     3
16      L     K     2

I want to use dplyr to create a column "symmetry" where for every pair of columns in which the source and target of one row equal the target and source of a different row, it assesses whether the "weight" column is equal. If it is equal, the column "symmetry" should read "symmetrical", and if they are not equal, it should read "asymmetrical".

This is the output I want:

   source target weight     symmetry
1       C     D     2   Symmetrical
2       F     E     0   Symmetrical
3       G     H     1   Symmetrical
4       B     A     2   Symmetrical
5       H     G     1   Symmetrical
6       A     B     2   Symmetrical
7       E     F     0   Symmetrical
8       D     C     2   Symmetrical
9       J     I     1   Asymmetrical
10      P     O     4   Asymmetrical
11      M     N     3   Asymmetrical
12      K     L     0   Asymmetrical
13      N     M     4   Asymmetrical
14      O     P     1   Asymmetrical
15      I     J     3   Asymmetrical
16      L     K     2   Asymmetrical

I tried this:

df<-df%>% 
  group_by(source=pmin(source,target),target=pmax(source,target)) %>% 
  mutate(symmetry=ifelse(n_distinct(weight)==1,"Symmetrical","Asymmetrical")) %>% 
  ungroup()

but found that when I looked at the resulting dataframe, there were a bunch of cases in which source==target, which should not be the case for any row. I'm struggling with both the group_by syntax and the mutate syntax.

My preference is to use dplyr for this task, but would welcome ideas using any other R packages as well! Thank you!


Solution

  • You are almost there, but you should not overwrite source and targe in group_by. You can try the code below

    df %>%
        group_by(p = pmin(source, target), q = pmax(source, target)) %>%
        mutate(symmetry = ifelse(n_distinct(weight) == 1, "Symmetrical", "Asymmetrical")) %>%
        ungroup() %>%
        select(-c(p, q))
    

    which gives

    # A tibble: 16 × 4
       source target weight symmetry
       <chr>  <chr>   <int> <chr>
     1 C      D           2 Symmetrical
     2 F      E           0 Symmetrical
     3 G      H           1 Symmetrical
     4 B      A           2 Symmetrical
     5 H      G           1 Symmetrical
     6 A      B           2 Symmetrical
     7 E      F           0 Symmetrical
     8 D      C           2 Symmetrical
     9 J      I           1 Asymmetrical
    10 P      O           4 Asymmetrical
    11 M      N           3 Asymmetrical
    12 K      L           0 Asymmetrical
    13 N      M           4 Asymmetrical
    14 O      P           1 Asymmetrical
    15 I      J           3 Asymmetrical
    16 L      K           2 Asymmetrical