rlapplypurrrreadxl

multiple excel sheets in R


I have a excel file with 131 sheets, that all contain data in the same format. Each sheet contains data of one station, and is named with that number and name. Here is the example for one sheet for station no. 13:

(13) stationName

So far I read the sheets one by one, for example

data13 <- read_excel("mydata.xlsx", n_max = 65, skip = 1, sheet = 7)

In every sheet there's data on 52 variables, that I collect by choosing the required columns.

ID111 <- data13[c(1:9)] 
ID112 <- data13[c(1,10:17)]
ID113 <- data13[c(1,18:25)]
ID114 <- data13[c(1,26:33)]
ID115 <- data13[c(1,34:41)]
ID123 <- data13[c(1,42:49)]
ID122 <- data13[c(1,50:57)]
etc.

these are then put into a list

datasets <- list(ID111, ID112, ID113, ID114, ID115, ID123, ID122) 
names(datasets) <- c(111, 112, 113, 114, 115, 123, 122)

and I reshape the data using

station13 <- map_dfr(datasets, ~ .x %>%
                       setNames( c("Year", "MS", "UL", "JS", "BF", "FF", "RF", "CL", "FL")) %>%
                       pivot_longer(cols = "MS":"FL",
                                    names_to = "Phase", 
                                    values_to = "DOY"
                       ), .id = 'Species') %>%
  relocate(Species, .after = 'Year')
station13$StationID <- "13"

at the very end I will combine all dataframes to have all data from all excel sheets in one file using rbind().

So far I did copy&paste it for every station and change the station number manually. Is there a way to make this more efficient?


Solution

  • Instead of reading in each of the 131 sheets manually, you can read all the sheets directly into a list using lapply (then do whatever operations you need on the list)

    library(readxl)
    # File path
    dataxlsx <- "/FilePath/data_file.xlsx"
    
    # Read in all sheets in excel file to a list
    xl_list <- lapply(excel_sheets(dataxlsx), read_excel, path = dataxlsx, n_max = 65, skip = 1)
    

    You can rename with

    names(xl_list) <- excel_sheets(dataxlsx)