I have a excel file with 131 sheets, that all contain data in the same format. Each sheet contains data of one station, and is named with that number and name. Here is the example for one sheet for station no. 13:
(13) stationName
So far I read the sheets one by one, for example
data13 <- read_excel("mydata.xlsx", n_max = 65, skip = 1, sheet = 7)
In every sheet there's data on 52 variables, that I collect by choosing the required columns.
ID111 <- data13[c(1:9)]
ID112 <- data13[c(1,10:17)]
ID113 <- data13[c(1,18:25)]
ID114 <- data13[c(1,26:33)]
ID115 <- data13[c(1,34:41)]
ID123 <- data13[c(1,42:49)]
ID122 <- data13[c(1,50:57)]
etc.
these are then put into a list
datasets <- list(ID111, ID112, ID113, ID114, ID115, ID123, ID122)
names(datasets) <- c(111, 112, 113, 114, 115, 123, 122)
and I reshape the data using
station13 <- map_dfr(datasets, ~ .x %>%
setNames( c("Year", "MS", "UL", "JS", "BF", "FF", "RF", "CL", "FL")) %>%
pivot_longer(cols = "MS":"FL",
names_to = "Phase",
values_to = "DOY"
), .id = 'Species') %>%
relocate(Species, .after = 'Year')
station13$StationID <- "13"
at the very end I will combine all dataframes to have all data from all excel sheets in one file using rbind().
So far I did copy&paste it for every station and change the station number manually. Is there a way to make this more efficient?
Instead of reading in each of the 131 sheets manually, you can read all the sheets directly into a list using lapply
(then do whatever operations you need on the list)
library(readxl)
# File path
dataxlsx <- "/FilePath/data_file.xlsx"
# Read in all sheets in excel file to a list
xl_list <- lapply(excel_sheets(dataxlsx), read_excel, path = dataxlsx, n_max = 65, skip = 1)
You can rename with
names(xl_list) <- excel_sheets(dataxlsx)