rtm

R Text Mining Term Adjacency Matrix


I have created a Document Term Matrix from my Corpus using the tm package.

dtm <- DocumentTermMatrix(myCorpus, control=list(wordLengths=c(4, 20),
       bounds = list(global = c(1,13))))

I then created a Term-Term Adjacency Matrix.

ttm_results <- t(as.matrix(dtm)) %*% as.matrix(dtm)

When I inspect a sample of my results

ttm_results[200:205, 200:205]

enter image description here

I notice it is a very large but sparse dataset.

How might I remove rows that are essentially zeros?

I consider essentially zero to include rows like 1,2 and 5 which do not have adjacent terms.


Solution

  • How about this

    #rebuilding your matrix 
    m <- diag(6)
    m[3, 3] = 71
    m[4, 5] = 1
    m[5, 4] = 1
    
    m
         [,1] [,2] [,3] [,4] [,5] [,6]
    [1,]    1    0    0    0    0    0
    [2,]    0    1    0    0    0    0 
    [3,]    0    0   71    0    0    0
    [4,]    0    0    0    1    1    0
    [5,]    0    0    0    1    1    0
    [6,]    0    0    0    0    0    1
    
    #answer
    m[!rowSums(m)==1, ]