I'm currently analysing survey data about people's preferred music genres. I want to build a network to show which genres people often enjoy in tandem.
I'm not sure what the best way to format my data is. It's currently in a form like this:
Genre A Genre B Genre C
Person A 1 1 1
Person B 1 1 0
Person C 1 1 0
Where a 1 indicates that they enjoy it, and a 0 that they don't. I'd like the network to connect genres and for the edges to show how often they're connected. I think the igraph package would work, but I need to convert my data to an edge list first.
Here's some quick code to set up the table in case you need it:
read.table(text =
"GenreA GenreB GenreC
PersonA 1 1 1
PersonB 1 1 0
PersonC 1 1 0
", header = TRUE)
I'm not really sure where to start.
If you convert your data frame to a matrix, then the cross-product of this matrix will be equivalent to the adjacency matrix of the desired graph.
library(igraph)
g <- as.matrix(df) |>
crossprod() |>
graph_from_adjacency_matrix(mode = 'undirected', diag = FALSE)
Now g
is your actual graph (network object)
plot(g)
And if you want it as an edgelist, you can do:
as_edgelist(g)
#> [,1] [,2]
#> [1,] "GenreA" "GenreB"
#> [2,] "GenreA" "GenreB"
#> [3,] "GenreA" "GenreB"
#> [4,] "GenreA" "GenreC"
#> [5,] "GenreB" "GenreC"
Created on 2023-08-18 with reprex v2.0.2