rnlptmterm-document-matrix

Maximal term length in Document Term Matrix


Imagine the following Document Term Matrix created by tm package:

> frequencies
<<DocumentTermMatrix (documents: 255, terms: 470)>>
Non-/sparse entries: 7693/112157
Sparsity           : 94%
Maximal term length: 10
Weighting          : term frequency (tf)

what is Maximal term length?


Solution

  • Maximal term length is the biggest number of characters of one (or more) of your terms in the document term matrix.

    Example: if you have 5 words in the dtm, and the longest term one is "programming", the maximal term length would be 11.

    text <- c("word1", "word2", "word3", "word4", "programming")
    corp <- Corpus(VectorSource(text))
    term <- DocumentTermMatrix(corp)
    term
    
    <<DocumentTermMatrix (documents: 5, terms: 5)>>
    Non-/sparse entries: 5/20
    Sparsity           : 80%
    Maximal term length: 11
    Weighting          : term frequency (tf)