rigraphadjacency-matrix

Code for weighted adjacency matrix from df with 8 columns of string data?


I really need help with code to create a weighted adjacency matrix from a dataset; some rows contain 1 or 2 ingredients, but others have more (up to 8). The resulting matrix will likely be upwards of 16x16 based on the number of unique ingredients in the dataset.

My data currently looks like the example below (but with different information). What column an ingredient shows up in is not important for the purposes of this network analysis but the co-occurrences and weighting are.

name1 name2 name3 name4 name5 name6 name7 name8
pineapple sugar mango water salt blueberry
pineapple asca
sugar pineapple water lime
lime asca pepper salt water
blueberry pineapple water salt strawberry banana asca sugar
mango

How do I write the code so that it will find all the co-occurrences/edges from all the columns, and not just the first two columns? That's one issue I'm having with trying to do the adjacency matrix from this data directly in R. I also need to preserve the names for the nodes (ingredients) so that when I create my network graph, the names will show up and not numbers, another issue I've had.

I have solid code that creates the network graph from an adjacency matrix for this new project, but previously I manually calculated the weighted adjacency matrix for a sample set as I was on a tight deadline.


Solution

  • If the row-wise incidents are desired, you can modify the answer by @ThomsIsCoding:

    m <- tcrossprod(table(stack(as.data.frame(t(df))))[-1,])
    m
    #>             values
    #> values       asca banana blueberry lime mango pepper pineapple salt strawberry sugar water
    #>   asca          3      1         1    1     0      1         2    2          1     1     2
    #>   banana        1      1         1    0     0      0         1    1          1     1     1
    #>   blueberry     1      1         2    0     1      0         2    2          1     2     2
    #>   lime          1      0         0    2     0      1         1    1          0     1     2
    #>   mango         0      0         1    0     2      0         1    1          0     1     1
    #>   pepper        1      0         0    1     0      1         0    1          0     0     1
    #>   pineapple     2      1         2    1     1      0         4    2          1     3     3
    #>   salt          2      1         2    1     1      1         2    3          1     2     3
    #>   strawberry    1      1         1    0     0      0         1    1          1     1     1
    #>   sugar         1      1         2    1     1      0         3    2          1     3     3
    #>   water         2      1         2    2     1      1         3    3          1     3     4
    

    Set the main diagonal to 0, if you want.

    diag(m) <- 0
    m
    #>             values
    #> values       asca banana blueberry lime mango pepper pineapple salt strawberry sugar water
    #>   asca          0      1         1    1     0      1         2    2          1     1     2
    #>   banana        1      0         1    0     0      0         1    1          1     1     1
    #>   blueberry     1      1         0    0     1      0         2    2          1     2     2
    #>   lime          1      0         0    0     0      1         1    1          0     1     2
    #>   mango         0      0         1    0     0      0         1    1          0     1     1
    #>   pepper        1      0         0    1     0      0         0    1          0     0     1
    #>   pineapple     2      1         2    1     1      0         0    2          1     3     3
    #>   salt          2      1         2    1     1      1         2    0          1     2     3
    #>   strawberry    1      1         1    0     0      0         1    1          0     1     1
    #>   sugar         1      1         2    1     1      0         3    2          1     0     3
    #>   water         2      1         2    2     1      1         3    3          1     3     0
    

    Data:

    df <- data.table::fread("name1  name2   name3   name4   name5   name6   name7   name8
                   pineapple    sugar   mango   water   salt    blueberry       
                   pineapple    asca                        
                   sugar    pineapple   water   lime                
                   lime asca    pepper  salt    water           
                   blueberry    pineapple   water   salt    strawberry  banana  asca    sugar
                   mango                            ")