I have created a Document Term Matrix from my Corpus using the tm
package.
dtm <- DocumentTermMatrix(myCorpus, control=list(wordLengths=c(4, 20),
bounds = list(global = c(1,13))))
I then created a Term-Term Adjacency Matrix.
ttm_results <- t(as.matrix(dtm)) %*% as.matrix(dtm)
When I inspect a sample of my results
ttm_results[200:205, 200:205]
I notice it is a very large but sparse dataset.
How might I remove rows that are essentially zeros
?
I consider essentially zero
to include rows like 1,2 and 5 which do not have adjacent
terms.
How about this
#rebuilding your matrix
m <- diag(6)
m[3, 3] = 71
m[4, 5] = 1
m[5, 4] = 1
m
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 0 0 0 0 0
[2,] 0 1 0 0 0 0
[3,] 0 0 71 0 0 0
[4,] 0 0 0 1 1 0
[5,] 0 0 0 1 1 0
[6,] 0 0 0 0 0 1
#answer
m[!rowSums(m)==1, ]