rcosine-similarityquantedadfm

How do I export a textstat_simil document without losing observations or variables?


I'm new to quanteda and I am having issues exporting my documents. I am comparing two documents, "dfm_latam", with more than 27k observations, and "dfm_cosines", which consists of two corpuses with texts to be compared with each one of the 27k observations of the dfm_latam database.

corpus_cosine_2 <- corpus(cosine_2_pdf)
corpus_cosines <- corpus_cosine_1 + corpus_cosine_2 
dfm_cosines <- dfm(corpus_cosines, case_insensitive = TRUE)


corpus_latam <- corpus(latam_review)
docvars(corpus_latam, "Text") <- names(corpus_latam$text)
dfm_latam <- dfm(corpus_latam, case_insensitive = TRUE)


simil_latam <- textstat_simil(dfm_latam, dfm_cosines, method = "cosine", margin = "documents", case_insensitive = TRUE)
view(simil_latam)

The view() function in R provides me with the first 1000 rows and everything is fine. Both numeric variables from the dfm_cosines are showing up. But, when I try to export it as an Excel document, the output looks completely different from the view() 1000 rows preview. One of the variables is missing, and the .xlsx output only shows "corpus_cosine_1's" results. The dfm "dfm_cosines" is made after both "corpus_cosine_1" and "corpus_cosine_2". Why does it happen when I export it?

openxlsx::write.xlsx(simil_latam, file = "F:\\path\\simil_latam.xlsx")

So, I tried exporting along with the view() function:

openxlsx::write.xlsx(view(simil_latam), file = "F:\\path\\simil_latam.xlsx")

For this write.xlsx(view()), the variables showing up are just right, but I only export 1.000 observations out of the 27.000+ I have. How do I automatically export all of the observations of the table with all variables showing up?


Solution

  • You need to convert the textstat_simil object to something more spreadsheet-like. Try

    as.matrix(simil_latam)
    

    before you call write.xlsx() or if you prefer this format,

    as.data.frame(simil_latam)
    

    I suggest you inspect both coerced objects before exporting them, and also see the help functions for each of these for these methods (found in the quanteda.textstats package).