rsna

Social Network Analysis: How to generate an edge list from socio-demographic data in R?


Consider a socio-demographic dataset consisting of n individuals with k variables capturing social traits. On the basis of this dataset, a Social Network Analysis will be conducted. The vertices of the network will be the n individuals. The edges will be shared social traits (e.g. all individuals with the same sex share an edge). The weights of the edges will be the number of shared social traits (concordances).

This question is: How to calculate the weights of the edges by comparing the rows for concordances? (The question will be clearer down below)

The list of vertices is given by the dataset:

#Reproducible Example:
id <- c(1:4)
sex <- c("male", "male", "female", "female")
age <- c("young", "young", "young", "old")
df <- data.frame(id, sex, age)

#Dataset:
df

  id    sex   age
1  1   male young
2  2   male young
3  3 female young
4  4 female   old

The edges will be:

#All unique combinations of ids:
df <- as.data.frame(t(combn(df$id, 2)))

#Edge list:
df

  V1 V2
1  1  2
2  1  3
3  1  4
4  2  3
5  2  4
6  3  4

This question is: How to calculate the weights of the edges by comparing the rows for concordances? The objective is to obtain a dataset like the following:

  V1 V2 weights
1  1  2       2  #Individual 1 and individual 2 share the same sex and age, thus weight = 2
2  1  3       1  #Individual 1 and individual 3 only share the same age, thus = 1
3  1  4       0  #Individual 1 and individual 4 share neither sex nor age, thus = 0
4  2  3       1  #...
5  2  4       0
6  3  4       1

I hope, the question is clear. Thank you in advance for your help!


Solution

  • You can use V1 and V2 as indices to return rows of df. Then use == to compare values. Using rowSums on the boolean values will return the sum like in your desired output.

    df2$weights <- rowSums(df[df2$V1, 2:3] == df[df2$V2, 2:3])
    
      V1 V2 weights
    1  1  2       2
    2  1  3       1
    3  1  4       0
    4  2  3       1
    5  2  4       0
    6  3  4       1
    

    Data

    id <- c(1:4) 
    sex <- c("male", "male", "female", "female")
    age <- c("young", "young", "young", "old")
    df <- data.frame(id, sex, age)
    
    df2 <- as.data.frame(t(combn(df$id, 2)))