SOF newbie here, so please bear with me and some daft questions!
Can anyone please tell me what the most efficient way would be to read in ~50+ excel files with multiple tabs into R, select a specific tab from all 50, and combine those tabs by date to create a large dataframe?
For example, let's assume I want to see what GP activity has been like in a specific healthcare region here in the UK for the last 4 years. I currently have to download many many monthly reports like this one Appointments in General Practice, March 2022
I then have to go to table 3a, scroll down to the NHS area code "07k", and then paste the activity from that row into a separate excel sheet.
I have to do this a lot, and it takes hours, but am not coding literate enough to figure out a better way of doing this.
From the guidance of others, my attempts to date have involved downloading many months' data to a desktop folder. I set my working drive to that folder, and then attempt the following:
files <- list.files(pattern = "*.xls", full.names = TRUE)
for(i in files){
filepath <- file.path("C:/Users/myname/Desktop/recordsfolder", i)
assign(i, readxl::read_xls(file path))
}
But the honest truth is I don't really understand this, nor what the next step would be.
If anyone could point me in a better direction I'd be very grateful.
Okay so there is a function readData
that reads and filters each of the table 3a's from the files in list.files
. You then apply this function to all of the files using map
and then bind_rows
to reduce the list of datasets to a single dataset.
I dont know how standardised your reports are so you may need to adjust the read function accordingly
library(readxl)
library(tidyverse)
readData <- function(path){
colnames <- read_excel(path, sheet = "Table 3a", skip = 10, n_max = 1) %>%
gather(header1, header2) %>%
mutate(across(everything(), ~gsub("\\.\\.\\.\\d+", "", .))) %>%
mutate(colname = ifelse(header1=="", header2, paste0(header1, "_", header2))) %>%
pull(colname)
read_excel(path, sheet = "Table 3a", skip = 13, col_names = colnames) %>%
filter(grepl("07k", `NHS Area Code`, ignore.case = TRUE))
}
files <- list.files("C:/Users/myname/Desktop/recordsfolder", full.name = TRUE)
files %>%
map(readData) %>%
bind_rows()