rlapplyopenxlsxreadxl

Importing all the excel worksheets from excel workbooks in a folder as separate datasets in R


I would like to write a function to read in all the excel worksheets of all the excel workbooks in a folder as separate datasets each named after the workbook and sheet.

For example, if the folder contains workbook A, which contains sheet1 and sheet2, I would like datasets created with the following names: workbookA_sheet1, workbookA_sheet2. I feel like I am really close to putting this together but just can't figure out how to modify the code that I found.

For example, this person wants to do a similar thing but only wants to read in a specific sheet from each excel file in the folder. I was trying to modify the code (below) to read in all the sheets from the excel file and I am not sure how to modify the code to do that.

Code from deschen from their answer to the question above:

library(openxlsx)
library(tidyverse)

Starting with your `files_list` we can do:

# using lapply to read in all files and store them as list elements in one list
list_of_dfs <- lapply(as.list(files_list), function(x) readWorkbook(x, sheet = "Balance"))

# Create a vector of names based on the first word of the filename + "Balance"
# Note that we can't use empty space in object names, hence the underscore
df_names <- paste0(str_extract(basename(files_list), "[^ ]+"), "_Balance_df")

# Assign the names to our list of dfs
names(list_of_dfs) <- df_names

# Push the list elements (i.e. data frames) to the Global environment
# I highly recommend NOT doing this. I'd say in 99% of the cases it's better to continue working in the list structure or combine the individual dfs into one large df.
list2env(list_of_dfs, env = .GlobalEnv)


Solution

  • With readxl you’d need a nested loop, outer to cycle through files and inner to cycle through sheets

    Here’s one possible aproach, using readxl example files.

    library(readxl)
    library(purrr)
    library(fs)
    
    all_sheets <- 
      (path_package("readxl") / "extdata") |> 
      dir_ls(glob = "*.xlsx") |> 
      set_names(nm = \(x) path_file(x) |> path_ext_remove()) |> 
      map(\(f) map(excel_sheets(f) |> set_names(), \(s) read_xlsx(path = f, sheet = s))) |> 
      list_flatten() |> 
      set_names(nm = make.names)
    #> New names:
    #> New names:
    #> • `` -> `...2`
    #> • `` -> `...3`
    #> • `` -> `...4`
    #> • `` -> `...5`
    #> • `` -> `...6`
    
    str(all_sheets, max.level = 1)
    #> List of 12
    #>  $ clippy_list.column      : tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
    #>  $ clippy_two.row.header   : tibble [2 × 4] (S3: tbl_df/tbl/data.frame)
    #>  $ datasets_mtcars         : tibble [32 × 11] (S3: tbl_df/tbl/data.frame)
    #>  $ datasets_chickwts       : tibble [71 × 2] (S3: tbl_df/tbl/data.frame)
    #>  $ datasets_quakes         : tibble [1,000 × 5] (S3: tbl_df/tbl/data.frame)
    #>  $ deaths_arts             : tibble [18 × 6] (S3: tbl_df/tbl/data.frame)
    #>  $ deaths_other            : tibble [18 × 6] (S3: tbl_df/tbl/data.frame)
    #>  $ geometry_Sheet1         : tibble [3 × 3] (S3: tbl_df/tbl/data.frame)
    #>  $ type.me_logical_coercion: tibble [10 × 2] (S3: tbl_df/tbl/data.frame)
    #>  $ type.me_numeric_coercion: tibble [7 × 2] (S3: tbl_df/tbl/data.frame)
    #>  $ type.me_date_coercion   : tibble [7 × 2] (S3: tbl_df/tbl/data.frame)
    #>  $ type.me_text_coercion   : tibble [6 × 2] (S3: tbl_df/tbl/data.frame)
    
    # please re-consider ... 
    list2env(all_sheets, env = .GlobalEnv)
    #> <environment: R_GlobalEnv>
    
    ls()
    #>  [1] "all_sheets"               "clippy_list.column"      
    #>  [3] "clippy_two.row.header"    "datasets_chickwts"       
    #>  [5] "datasets_mtcars"          "datasets_quakes"         
    #>  [7] "deaths_arts"              "deaths_other"            
    #>  [9] "geometry_Sheet1"          "type.me_date_coercion"   
    #> [11] "type.me_logical_coercion" "type.me_numeric_coercion"
    #> [13] "type.me_text_coercion"
    

    Created on 2025-04-17 with reprex v2.1.1