I'm using RevoR entreprise to handle impoting large data files. The example given in the documentation states that 10 files (1000000 rows each) will be imported as dataset using an rxImport loop like this :
setwd("C:/Users/Fsociety/Bigdatasamples")
Data.Directory <- "C:/Users/Fsociety/Bigdatasamples"
Data.File <- file.path(Data.Directory,"mortDefault")
mortXdfFileName <- "mortDefault.xdf"
append <- "none"
for(i in 2000:2009){
importFile <- paste(Data.File,i,".csv",sep="")
mortxdf <- rxImport(importFile, mortXdfFileName, append = append, overwrite = TRUE, maxRowsByCols = NULL)
append <- "rows"
}
mortxdfData <- RxXdfData(mortXdfFileName)
knime.out <- rxXdfToDataFrame(mortxdfData)
The issue here is that I only get 500000 rows in the dataset due to the maxRowsByCols
argument the default is 1e+06
i changed it to a higher value and then to NULL
but it still truncates the data from the file.
Fixed, the issue was that the RxXdfData() has a maxrowbycols limitation,changing it to NULL will convert the whole rxXdfData into a data.frame object for Knime.