My application is reading the xls
and xlsx
files using the read_excel
function of the readxl
package.
The sequence and the exact number of columns are not known earlier while reading the xls
or xlsx
file. There are 15 predefined columns out of which 10 columns are mandatory and remaining 5 columns are optional. So the file will always have minimum 10 columns and at maximum 15 columns.
I need to specify the the col-types
to the mandatory 10 columns. The only way I can think of is using the column names to specify the col_types
as I know for fact that the file has all 10 columns which are mandatory but they are in the random sequence.
I tried looking out for the way of doing so but failed to do so.
Can anyone help me find a way to assign the col_types by column names?
I solve the problem by below workaround. It is not the best way to solve this problem though. I have read the excel file twice which will take a hit on performance if the file has very large volume of data.
First read: Building column data type vector- Reading the file for retrieving the columns information(like column names, number of columns and it's types) and building the column_data_types
vector
which will have the datatype
for every column in the file.
#reading .xlsx file
site_data_columns <- read_excel(paste(File$datapath, ".xlsx", sep = ""))
site_data_column_names <- colnames(site_data_columns)
for(i in 1 : length(site_data_column_names)){
#where date is a column name
if(site_data_column_names[i] == "date"){
column_data_types[i] <- "date"
#where result is a column name
} else if (site_data_column_names[i] == "result") {
column_data_types[i] <- "numeric"
} else{
column_data_types[i] <- "text"
}
}
Second read: Reading the file content- reading the excel file by supplying col_types
parameter with the vector
column_data_types
which has the column data types
.
#reading .xlsx file
site_data <- read_excel(paste(File$datapath, ".xlsx", sep = ""), col_types = column_data_types)