rmachine-learning

How to find the connected instances from a minimum spanning trees model in R


I am building a Minimum Spanning Trees model, and it succeeded. I generated a plot and wanted to identify which alternative data points are connected for each data point. Is there a way to do that?

The modeling code is as below.

data(iris)
mst.mod <- ape::mst(dist(iris))
plot(mst.mod)

The tree is visualized. It looks a bit messy but I want to identify, for example, which instances are connected with instance 1 and so on. Visually, it can be seen that instance has an edge with instances 28 and 40. But is there a R code to find them all for each data point? enter image description here


Solution

  • Yes, there is.

    We can use base. Convert mst.mod to matrix, apply which() to find indices where 1 occurs, and, for instance, convert to a list.

    mst.mod = ape::mst(dist(iris))
    unstack(as.data.frame(which(as.matrix(mst.mod)==1L, arr.ind=TRUE)))
    
    

    giving

    > |> head()
    $`1`
    [1]  5 18 28 40
    
    $`2`
    [1] 13 35 46
    
    $`3`
    [1] 48
    
    $`4`
    [1] 30 48
    
    $`5`
    [1]  1 38
    
    $`6`
    [1] 11 19
    

    For 1, besides 18 and 40 there are 5 and 28. Depending on the desired output, which(as.matrix(mst.mod )==1L, arr.ind=TRUE) might be enough. Haven't checked the documentation/help files if there is a more direct way using {ape}.