I have an Excel file that has multiple sheetnames
and these sheet names don't always have the same structure. I want to be able to read the Excel file, read only some specifics sheets, select some specific columns and then create a concatenated table I have the following Excel file:
File_Mother.xlsx
it has 5 sheets but we would only need those sheets that have the pattern "inning_19" at the beginning of their names so "inning_195" and "inning_1988" will be of interest if they exist within the Excel file, extract the columns ID_MATCH
and SET_GAME
along with all other columns that starts with the pattern "id_complete_game_on
" and finally create a column that will let me know the sheet name that this info was taken from when we bind
vertically all the data selected. For example:
From File_Mother.xlsx
I have
sheet.name = "inning_195" with the following info
ID_MATCH | SET_GAME | id_complete_game_on_field | id_complete_game_on_camp | |
---|---|---|---|---|
X01 | M | cleared | not cleared | |
X85 | Q | cleared | cleared | |
X89 | U | cleared | cleared |
From File_Mother.xlsx
I have
sheet.name = "inning_1988" with the following info
ID_MATCH | SET_GAME | id_complete_game_on_demand_one | id_complete_game_on_camp | |
---|---|---|---|---|
IKX01 | MR | completed | not cleared | |
IKX02 | DD | not completed | cleared | |
IKX03 | TT | draw | cleared |
after the desired data wrangling the output will be
ID_MATCH | SET_GAME | id_complete_game_on_field | id_complete_game_on_camp | id_complete_game_on_demand_one | sheet name |
---|---|---|---|---|---|
X01 | M | cleared | not cleared | NA | inning_195 |
X85 | Q | cleared | cleared | NA | inning_195 |
X89 | U | cleared | cleared | NA | inning_195 |
IKX01 | MR | NA | NA | completed | inning_1988 |
IKX02 | DD | NA | NA | not completed | inning_1988 |
IKX03 | TT | NA | NA | draw | inning_1988 |
I slightly edit reading all sheets of excel by @Jeromy Anglim.
library(dplyr)
library(purrr)
read_excel_somesheets <- function(filename, key,tibble = FALSE) {
sheets <- readxl::excel_sheets(filename)
x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X) %>% mutate(sheet_name = X))
if(!tibble) x <- lapply(x, as.data.frame)
names(x) <- sheets
y <- x[grepl(key, names(x))] %>%
reduce(full_join)
y
}
Then you may try
read_excel_somesheets("your_path/File_Mother.xlsx", "inning_19")
And result is like
ID_MATCH SET_GAME id_complete_game_on_field id_complete_game_on_camp sheet_name id_complete_game_on_demand_one
1 X01 M cleared not cleared inning_195 <NA>
2 X85 Q cleared cleared inning_195 <NA>
3 X89 U cleared cleared inning_195 <NA>
4 IKX01 MR <NA> not cleared inning_1988 completed
5 IKX02 DD <NA> cleared inning_1988 not completed
6 IKX03 TT <NA> cleared inning_1988 draw