I am currently trying to combine multiple documents of a corpus into a single document using the topicmodels package. I initially imported my data through multiple csvs, each with multiple lines of text. When I import each csv, however, each line of the csv is treated as a document, and each csv is treated as a corpus. What I would like to do is merge each of the documents/lines for each csv into a single document, and then each of the csvs would represent one document in my corpus. I'm not sure if this possible--perhaps it would be easier to somehow read in all of the lines of the csv as a single text file when initially importing and then create the docs and corpus, but I don't know how to do that either. Below is the code that I have used to import my csvs:
file <- read.csv("file.csv")
fileCorp <- VCorpus(VectorSource(file$text))
The rows in the csv look something like this (where each / represents a line break): 'I walked' / 'the dog' / 'at the' / 'park last night'
I would like to combine each of those lines into a single line of text that will serve as a single document in my corpus.
Thanks for the help!
Your task can be accomplished with these steps:
file1 <- data.frame(text = c('I walked','the dog','at the','park last night'))
file2 <- data.frame(text = c('He walked','the cat','at the','yesterday'))
data.frame(id = c(1, 2),
text = c(paste(file1$text, collapse = " "),
paste(file2$text, collapse = " ")))
id text
1 1 I walked the dog at the park last night
2 2 He walked the cat at the yesterday