rimporttidyversedata-wranglingreadxl

Concatenate columns from Excel file based on sheet name and column's name


I have an Excel file that has multiple sheetnames and these sheet names don't always have the same structure. I want to be able to read the Excel file, read only some specifics sheets, select some specific columns and then create a concatenated table I have the following Excel file:

File_Mother.xlsx it has 5 sheets but we would only need those sheets that have the pattern "inning_19" at the beginning of their names so "inning_195" and "inning_1988" will be of interest if they exist within the Excel file, extract the columns ID_MATCH and SET_GAME along with all other columns that starts with the pattern "id_complete_game_on" and finally create a column that will let me know the sheet name that this info was taken from when we bind vertically all the data selected. For example:

From File_Mother.xlsx I have sheet.name = "inning_195" with the following info

ID_MATCH SET_GAME id_complete_game_on_field id_complete_game_on_camp
X01 M cleared not cleared
X85 Q cleared cleared
X89 U cleared cleared

From File_Mother.xlsx I have sheet.name = "inning_1988" with the following info

ID_MATCH SET_GAME id_complete_game_on_demand_one id_complete_game_on_camp
IKX01 MR completed not cleared
IKX02 DD not completed cleared
IKX03 TT draw cleared

after the desired data wrangling the output will be

ID_MATCH SET_GAME id_complete_game_on_field id_complete_game_on_camp id_complete_game_on_demand_one sheet name
X01 M cleared not cleared NA inning_195
X85 Q cleared cleared NA inning_195
X89 U cleared cleared NA inning_195
IKX01 MR NA NA completed inning_1988
IKX02 DD NA NA not completed inning_1988
IKX03 TT NA NA draw inning_1988

Solution

  • I slightly edit reading all sheets of excel by @Jeromy Anglim.

    library(dplyr)
    library(purrr)
    
    read_excel_somesheets <- function(filename, key,tibble = FALSE) {
      sheets <- readxl::excel_sheets(filename)
      x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X) %>% mutate(sheet_name = X))
      if(!tibble) x <- lapply(x, as.data.frame)
      names(x) <- sheets
      y <- x[grepl(key, names(x))] %>%
        reduce(full_join)
      y
    }
    

    Then you may try

    read_excel_somesheets("your_path/File_Mother.xlsx", "inning_19")
    

    And result is like

      ID_MATCH SET_GAME id_complete_game_on_field id_complete_game_on_camp  sheet_name id_complete_game_on_demand_one
    1      X01        M                   cleared              not cleared  inning_195                           <NA>
    2      X85        Q                   cleared                  cleared  inning_195                           <NA>
    3      X89        U                   cleared                  cleared  inning_195                           <NA>
    4    IKX01       MR                      <NA>              not cleared inning_1988                      completed
    5    IKX02       DD                      <NA>                  cleared inning_1988                  not completed
    6    IKX03       TT                      <NA>                  cleared inning_1988                           draw