I have an issue with XLConnect library. Its system.file function, I've only seen be used with
file1 <- system.file(file, package ="XLConnect")
Where file is always equal to a string. However I have a list of strings with the file name of each file in it. I have this inside of a for loop that iterates through the list. For whatever reason, file1 is empty even though the file object has the equivalent "soandso.xlsx".
So my question is do I need file to be a string?
Edit #1: So I switched from using the XLConnect library which was giving me a separate error to xlsx library and at least now, I can get the code to run, albeit with different errors.
I think I am making progress though because I had the code unsuccessfully run past the first loop (since k incremented to 2) and I had something populate my dataset variable (where I store the 'read.xlsx2' reponse).
I've attached screenshot of my current error:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.IllegalArgumentException: Your InputStream was neither an OLE2 stream, nor an OOXML stream
Edit # 2:
https://www.dropbox.com/s/s9413oee146v497/Q1%2C%202011%20-%20Halton%2C%20all%20home%20types.xlsx?dl=0 https://www.dropbox.com/s/tgaup2ihnlq7br2/Q4%2C%202015%20-%20Halton%2C%20condo%20types.xlsx?dl=0
I've included 2/60 file I need to attach. All files at my destination are Excel files, which are 10 columns wide and variable row length - due to new regions or just regions just being renamed. The new error I'm getting is :
In addition: Warning message: In unzip(xlsxFile, exdir = xmlDir) : error 1 in extracting from zip file
Edit#3:
source('~/.active-rstudio-document', echo=TRUE)
> library("openxlsx")
> file_list = list.files(pattern = "xlsx$")
> dataset = data.frame()
> for (file in file_list){
+
+ print(file)
+ temp = read.xlsx(file,sheet = 1, colNames = F)
+ temp$quarter = substr(file,1,2)
+ temp$year = s .... [TRUNCATED]
[1] "~$Q1, 2011 - Halton, all home types.xlsx"
Error in file(con, "r") : invalid 'description' argument
In addition: Warning message:
In unzip(xlsxFile, exdir = xmlDir) : error 1 in extracting from zip file
Sorry but I'm still having issues, however I know where the problem is. For some reason what's being printed as file is coming with "~$" before it, when I input just the file name 'Q1, 2011 - Halton, all home types.xlsx' as the file, it works and I get data to be inputted into dataframe. Now the problem with this I obviously cannot mass import 60 different Excel names...
Try using openxlsx
library and the following code:
library("openxlsx")
file_list = list.files(pattern = "xlsx$")
dataset = data.frame()
for (file in file_list){
print(file)
temp = read.xlsx(file,sheet = 1, colNames = F)
temp$quarter = substr(file,1,2)
temp$year = substr(file,4,8)
dataset = rbind(dataset,temp)
}
Edit 1: Added a print(file) to detect which xlsx file is having unzip problems.
Edit 2: Added quarter and year information by reading them from the filename. This only works if all filenames follow the same pattern.