I would like to write a function to read in all the excel worksheets of all the excel workbooks in a folder as separate datasets each named after the workbook and sheet.
For example, if the folder contains workbook A, which contains sheet1 and sheet2, I would like datasets created with the following names: workbookA_sheet1, workbookA_sheet2. I feel like I am really close to putting this together but just can't figure out how to modify the code that I found.
For example, this person wants to do a similar thing but only wants to read in a specific sheet from each excel file in the folder. I was trying to modify the code (below) to read in all the sheets from the excel file and I am not sure how to modify the code to do that.
Code from deschen from their answer to the question above:
library(openxlsx)
library(tidyverse)
Starting with your `files_list` we can do:
# using lapply to read in all files and store them as list elements in one list
list_of_dfs <- lapply(as.list(files_list), function(x) readWorkbook(x, sheet = "Balance"))
# Create a vector of names based on the first word of the filename + "Balance"
# Note that we can't use empty space in object names, hence the underscore
df_names <- paste0(str_extract(basename(files_list), "[^ ]+"), "_Balance_df")
# Assign the names to our list of dfs
names(list_of_dfs) <- df_names
# Push the list elements (i.e. data frames) to the Global environment
# I highly recommend NOT doing this. I'd say in 99% of the cases it's better to continue working in the list structure or combine the individual dfs into one large df.
list2env(list_of_dfs, env = .GlobalEnv)
With readxl
you’d need a nested loop,
outer to cycle through files and inner to cycle through sheets
Here’s one possible aproach, using readxl
example files.
dir_ls
returns named vector, 1st set_names
cleans names (filenames without extensions);map
for nested loop, returns nested list, sheets for inner level names;list_flatten
flattens list and by default uses "{outer}_{inner}"
for naming;set_name
calls make.names
on existing names to make those syntactically valid (e.g. replaces characters like “-”).library(readxl)
library(purrr)
library(fs)
all_sheets <-
(path_package("readxl") / "extdata") |>
dir_ls(glob = "*.xlsx") |>
set_names(nm = \(x) path_file(x) |> path_ext_remove()) |>
map(\(f) map(excel_sheets(f) |> set_names(), \(s) read_xlsx(path = f, sheet = s))) |>
list_flatten() |>
set_names(nm = make.names)
#> New names:
#> New names:
#> • `` -> `...2`
#> • `` -> `...3`
#> • `` -> `...4`
#> • `` -> `...5`
#> • `` -> `...6`
str(all_sheets, max.level = 1)
#> List of 12
#> $ clippy_list.column : tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#> $ clippy_two.row.header : tibble [2 × 4] (S3: tbl_df/tbl/data.frame)
#> $ datasets_mtcars : tibble [32 × 11] (S3: tbl_df/tbl/data.frame)
#> $ datasets_chickwts : tibble [71 × 2] (S3: tbl_df/tbl/data.frame)
#> $ datasets_quakes : tibble [1,000 × 5] (S3: tbl_df/tbl/data.frame)
#> $ deaths_arts : tibble [18 × 6] (S3: tbl_df/tbl/data.frame)
#> $ deaths_other : tibble [18 × 6] (S3: tbl_df/tbl/data.frame)
#> $ geometry_Sheet1 : tibble [3 × 3] (S3: tbl_df/tbl/data.frame)
#> $ type.me_logical_coercion: tibble [10 × 2] (S3: tbl_df/tbl/data.frame)
#> $ type.me_numeric_coercion: tibble [7 × 2] (S3: tbl_df/tbl/data.frame)
#> $ type.me_date_coercion : tibble [7 × 2] (S3: tbl_df/tbl/data.frame)
#> $ type.me_text_coercion : tibble [6 × 2] (S3: tbl_df/tbl/data.frame)
# please re-consider ...
list2env(all_sheets, env = .GlobalEnv)
#> <environment: R_GlobalEnv>
ls()
#> [1] "all_sheets" "clippy_list.column"
#> [3] "clippy_two.row.header" "datasets_chickwts"
#> [5] "datasets_mtcars" "datasets_quakes"
#> [7] "deaths_arts" "deaths_other"
#> [9] "geometry_Sheet1" "type.me_date_coercion"
#> [11] "type.me_logical_coercion" "type.me_numeric_coercion"
#> [13] "type.me_text_coercion"
Created on 2025-04-17 with reprex v2.1.1