I would like to calculate the readability scores in R-3.3.2(R-Studio 3.4 for Win) using koRpus package for several txt.files and save results to excel or sqllite3 or txt. Now I can only calculate the readability score for one file only and print them to console. I tried to improve the code using loop over directory but it fails to work correctly.
library(koRpus)
library(tm)
#Loop through files
path = "D://Reports"
out.file<-""
file.names <- dir(path, pattern =".txt")
for(i in 1:length(file.names)){
file <- read.table(file.names[i],header=TRUE, sep=";", stringsAsFactors=FALSE)
out.file <- rbind(out.file, file)
}
#Only one file
report <- tokenize(txt =file , format = "file", lang = "en")
#SMOG-Index
results_smog <- SMOG(report)
summary(results_smog)
#Flesch/Kincaid-Index
results_fleshkin <- flesch.kincaid(report)
summary(results_fleshkin)
#FOG-Index
results_fog<- FOG(report)
summary(results_fog)
I ran to this same problem. I was looking through stackoverflow for a solution and saw your post. After some trial and error, I came up with the following code. Worked fine for me. I pulled out all the extra info. To find the index values of the scores i was looking for, i first ran it for one file and pulled the summary of the readability wrapper. It'll give you a table of a bunch of different values. Match the column with the row and you get the specific number to look for. There are lots of different options.
In the path directory, your files should be independent text files.
#Path
path="C:\\Users\\Philipp\\SkyDrive\\Documents\\Thesiswork\\ReadStats\\"
#list text files
ll.files <- list.files(path = path, pattern = "txt", full.names = TRUE);length(ll.files)
#set vectors
SMOG.score.vec=rep(0.,length(ll.files))
FleshKincaid.score.vec=rep(0.,length(ll.files))
FOG.score.vec=rep(0.,length(ll.files))
#loop through each file
for (i in 1:length(ll.files)){
#tokenize
tagged.text <- koRpus::tokenize(ll.files[i], lang="en")
#hyphen the word for some of the packages that require it
hyph.txt.en <- koRpus::hyphen(tagged.text)
#Readability wrapper
readbl.txt <- koRpus::readability(tagged.text, hyphen=hyph.txt.en, index="all")
#Pull scores, convert to numeric, and update the vectors
SMOG.score.vec[i]=as.numeric(summary(readbl.txt)$raw[36]) #SMOG Score
FleshKincaid.score.vec[i]=as.numeric(summary(readbl.txt)$raw[11]) #Flesch Reading Ease Score
FOG.score.vec[i]=as.numeric(summary(readbl.txt)$raw[22]) #FOG score
if (i%%10==0)
cat("finished",i,"\n")}
#if you wanted to do just one
df=cbind(FOG.score.vec,FleshKincaid.score.vec,SMOG.score.vec)
colnames(df)=c("FOG", "Flesch Kincaid", "SMOG")
write.csv(df,file=paste0(path,"Combo.csv"),row.names=FALSE,col.names=TRUE)
# if you wanted to write seperate csvs
write.csv(SMOG.score.vec,file=paste0(path,"SMOG.csv"),row.names=FALSE,col.names = "SMOG")
write.csv(FOG.score.vec,file=paste0(path,"FOG.csv"),row.names=FALSE,col.names = "FOG")
write.csv(FleshKincaid.score.vec,file=paste0(path,"FK.csv"),row.names=FALSE,col.names = "Flesch Kincaid")