I really need help with code to create a weighted adjacency matrix from a dataset; some rows contain 1 or 2 ingredients, but others have more (up to 8). The resulting matrix will likely be upwards of 16x16 based on the number of unique ingredients in the dataset.
My data currently looks like the example below (but with different information). What column an ingredient shows up in is not important for the purposes of this network analysis but the co-occurrences and weighting are.
name1 | name2 | name3 | name4 | name5 | name6 | name7 | name8 |
---|---|---|---|---|---|---|---|
pineapple | sugar | mango | water | salt | blueberry | ||
pineapple | asca | ||||||
sugar | pineapple | water | lime | ||||
lime | asca | pepper | salt | water | |||
blueberry | pineapple | water | salt | strawberry | banana | asca | sugar |
mango |
How do I write the code so that it will find all the co-occurrences/edges from all the columns, and not just the first two columns? That's one issue I'm having with trying to do the adjacency matrix from this data directly in R. I also need to preserve the names for the nodes (ingredients) so that when I create my network graph, the names will show up and not numbers, another issue I've had.
I have solid code that creates the network graph from an adjacency matrix for this new project, but previously I manually calculated the weighted adjacency matrix for a sample set as I was on a tight deadline.
If the row-wise incidents are desired, you can modify the answer by @ThomsIsCoding:
m <- tcrossprod(table(stack(as.data.frame(t(df))))[-1,])
m
#> values
#> values asca banana blueberry lime mango pepper pineapple salt strawberry sugar water
#> asca 3 1 1 1 0 1 2 2 1 1 2
#> banana 1 1 1 0 0 0 1 1 1 1 1
#> blueberry 1 1 2 0 1 0 2 2 1 2 2
#> lime 1 0 0 2 0 1 1 1 0 1 2
#> mango 0 0 1 0 2 0 1 1 0 1 1
#> pepper 1 0 0 1 0 1 0 1 0 0 1
#> pineapple 2 1 2 1 1 0 4 2 1 3 3
#> salt 2 1 2 1 1 1 2 3 1 2 3
#> strawberry 1 1 1 0 0 0 1 1 1 1 1
#> sugar 1 1 2 1 1 0 3 2 1 3 3
#> water 2 1 2 2 1 1 3 3 1 3 4
Set the main diagonal to 0
, if you want.
diag(m) <- 0
m
#> values
#> values asca banana blueberry lime mango pepper pineapple salt strawberry sugar water
#> asca 0 1 1 1 0 1 2 2 1 1 2
#> banana 1 0 1 0 0 0 1 1 1 1 1
#> blueberry 1 1 0 0 1 0 2 2 1 2 2
#> lime 1 0 0 0 0 1 1 1 0 1 2
#> mango 0 0 1 0 0 0 1 1 0 1 1
#> pepper 1 0 0 1 0 0 0 1 0 0 1
#> pineapple 2 1 2 1 1 0 0 2 1 3 3
#> salt 2 1 2 1 1 1 2 0 1 2 3
#> strawberry 1 1 1 0 0 0 1 1 0 1 1
#> sugar 1 1 2 1 1 0 3 2 1 0 3
#> water 2 1 2 2 1 1 3 3 1 3 0
Data:
df <- data.table::fread("name1 name2 name3 name4 name5 name6 name7 name8
pineapple sugar mango water salt blueberry
pineapple asca
sugar pineapple water lime
lime asca pepper salt water
blueberry pineapple water salt strawberry banana asca sugar
mango ")