rexcelreadxl

Is there any way to assign the col_types by column names in read_excel(readxl) in R


My application is reading the xls and xlsx files using the read_excel function of the readxl package.

The sequence and the exact number of columns are not known earlier while reading the xls or xlsx file. There are 15 predefined columns out of which 10 columns are mandatory and remaining 5 columns are optional. So the file will always have minimum 10 columns and at maximum 15 columns.

I need to specify the the col-types to the mandatory 10 columns. The only way I can think of is using the column names to specify the col_types as I know for fact that the file has all 10 columns which are mandatory but they are in the random sequence.

I tried looking out for the way of doing so but failed to do so.

Can anyone help me find a way to assign the col_types by column names?


Solution

  • I solve the problem by below workaround. It is not the best way to solve this problem though. I have read the excel file twice which will take a hit on performance if the file has very large volume of data.

    First read: Building column data type vector- Reading the file for retrieving the columns information(like column names, number of columns and it's types) and building the column_data_types vector which will have the datatype for every column in the file.

    #reading .xlsx file
    site_data_columns <- read_excel(paste(File$datapath, ".xlsx", sep = ""))
    
    site_data_column_names <- colnames(site_data_columns)
    
    for(i in 1 : length(site_data_column_names)){  
    
        #where date is a column name
        if(site_data_column_names[i] == "date"){
             column_data_types[i] <- "date"
    
             #where result is a column name
             } else if (site_data_column_names[i] == "result") {
                          column_data_types[i] <- "numeric"
    
             } else{
                    column_data_types[i] <- "text"
            }
    }
    

    Second read: Reading the file content- reading the excel file by supplying col_types parameter with the vector column_data_types which has the column data types.

    #reading .xlsx file
    site_data <- read_excel(paste(File$datapath, ".xlsx", sep = ""), col_types = column_data_types)