I need some help with loading text-file data into R for analysis with packages like koRpus.
The problem I am facing is getting R to recognize a folder full of Word files (about 4,000) as data which I can then make koRpus perform analyses like Coleman-Liau indexing. If at all possible, I prefer to make this work with Word files. The key problem is the struggle to cause R to recognize the text (Word) files in bulk (that is, all at the same time) so that koRpus can do its thing with those files.
My attempts to make this work have all been in vain, but I know that packages like koRpus would be limited in usefulness if there were no way to get the package to do its work on a large collection of files all at once.
I hope this problem will make sense to someone, and that there is a tenable solution to it.
Thanks, Gordon
Looks like the readtext
package should be able to help you out.
library(readtext)
Just specify the folder in the readtext()
call. Like so:
doc_df <-
readtext("doc_files/")
I am not familiar with the koRpus
package, but the text
column
in the created dataframe should contain what is needed for further
function you want to use.
doc_df$text
#> [1] "Test1: a little bit of text" "Test2: no further text"
#> [3] "Test3: lorem ipsum bla bla"
In response to your comments:
It looks like your folder has several kinds of files in it and you are trying to filter them, so that only docx
files are processed. The readtext
command seems to support that kind of filtering, but the documentation says, that it is depending on the OS. My suggestion is to rather filter the files in the folder with R's dir()
command, before calling readtext()
:
a <- dir("doc_files/", pattern = "docx", full.names = TRUE)
doc_df <- readtext(a)