rtwittertmtidyterm-document-matrix

Word Term Matrix


I would love to create a Word matrix from some tweets, each word from the tweet has to be a new variable and be filled with 1 for only the words that correspond to that text in the tweet

x <- data.frame("Tweet" = c("hi all","I need help"), "N" = 1, "Reaction" = c("Happy", "Sad"), stringsAsFactors = FALSE)

I would love to paste the output but dont know how to do it honestly, sorry

enter image description here


Solution

  • You could do it like this:

    library(tm)
    
    x <- data.frame("Tweet" = c("hi all","I need help"), "N" = 1, "Reaction" = c("Happy", "Sad"), stringsAsFactors = FALSE)
    
    corp <- VCorpus(VectorSource(x$Tweet))
    # adjust wordLengths, default is c(3, Inf)
    dtm <- DocumentTermMatrix(corp, control = list(wordLengths = c(1, Inf)))
    data.frame(Tweet = x$Tweet, as.matrix(dtm), Reaction = x$Reaction)
    
                Tweet all help hi i need Reaction
    1      hi all   1    0  1 0    0    Happy
    2 I need help   0    1  0 1    1      Sad