rdataframeterm-document-matrix

R convert dataframe to term-document-matrix


I'm currently learning my ways around R and Im troubled by the following problem:

Ive got a dataframe that is build up like this

word       freq1        freq2

tree        10           20
this         2            3
that         4            5
...

It shows the frequency in which the word is used in text 1 (freq1) and text 2 (freq2). Is it possible to transform this to a term-document-matrix? I need it to be a term-document-matrix to apply the following function

par(mfrow=c(1,1))
comparison.cloud(tdm, random.order=FALSE, colors = 
c("indianred3","lightsteelblue3"),
title.size=2.5, max.words=400)

from https://rpubs.com/brandonkopp/creating-word-clouds-in-r

Thanks :)


Solution

  • EDIT: After reshaping your data:

    library(reshape2)
    library(tm)
    library(dplyr)
    library(wordcloud)
    df2<-df %>% 
      gather("Origin","Freq",c(2,3)) %>% 
      acast(word~Origin,fill=0,value.var = "Freq")
    comparison.cloud(df2, random.order=FALSE, colors = c("indianred3","lightsteelblue3"),
                     max.words=400)
    

    Result: enter image description here

    Original answer: There is something wrong with your data as it stands. Here is a basic workflow leading up to either a wordcloud or comparison cloud.

    library(tm)
    library(dplyr)
    library(wordcloud)
    df<-read.table(text="word       freq1        freq2
    
                   Tree        10           20
                   This         2            3
                   That         4            5",header=T)
    df$word<-as.character(df$word)
    df1<-df %>% 
      gather()
    corpus_my<-Corpus(VectorSource(df1))
    tdm<-as.matrix(TermDocumentMatrix(corpus_my))
    comparison.cloud(tdm, random.order=FALSE, colors = c("indianred3","lightsteelblue3"),
                     max.words=400)
    

    This gives which is not what you expect. I would suggest restructuring your data first: enter image description here