rtidyverseexportreshape2r-xlsx

How to combine columns of different excel files and renaming the column with the name of the initial file using R?


I have 96 different Excel files in a folder that all contain one single column, which looks approximately like this, whereby the uppermost line is the name of the Excel file:

A1.xlsx
Distance
245
534
142
6342
634
4343
2323
1123
666

A2.xlsx
Distance
2453
2452
4434
3456
6754

However, it would like to have one single .xlsx file, where all the separate files are copied into a new column side by side containing the name of the source file as a header for the column. Furthermore, the current header should be removed. So, it should look like this then:

A1     A2     ...    Cn
245    2453   ...    ...
534    2452   ...
142    4434   ...
6342   3456   ...
634    6754   ...
4343          ...
2323          ...
1123
666

So far, I the following code:

fileList <-file.path(dir(path = ".", pattern = "^[a-zA-Z].+$", full.names = TRUE), "*.xlsx")
load_if_exists <-function(filename, ...) {
  tryCatch(
    suppressWarnings(read.table(filename, ...)),
    error = function(x) tibble(NA)
  )
}

fileList <- list.files(".", "*.xlsx", full.names = TRUE)

listData <- lapply(fileList, read.table)

names(listData) <- gsub(".xlsx","",basename(fileList))

bind_rows(listData, .id = "FileName") %>%
  group_by(FileName) %>%
  mutate(rowNum = row_number()) %>%
  dcast(rowNum~FileName, value.var = "V1") %>%
  select(-rowNum) %>%
  write.xlsx(file="Result.xlsx")

With this code, a new file is generated, but the data is not available. Furthermore, there was also no error code.

Could please anybody help me solving this problem?

Thank you very much!


Solution

  • You can do this in with just a few steps - note, to combine them side-by-side you need to ensure they all have the same number of rows. So (1) import the data, (2) find the max number of rows in all the imported excel files, and (3) assign NA (or "" if you prefer) to the data frames that are shorter than that (4) combine them all (here, using do.call):

    flpth <- "your_filepath_here"
    
    fileList <- list.files(flpth, ".xlsx", full.names = TRUE)
    
    listData <- lapply(fileList, readxl::read_excel)
    
    maxLength <- max(unlist(lapply(listData, nrow)))
    
    lengthList <- lapply(listData, \(x){ 
      if(nrow(x) < maxLength) x[(nrow(x)+1):maxLength,] <- NA
      x})
    
    allData <- do.call(data.frame, lengthList)
    
    writexl::write_xlsx(allData, path = paste0(flpth, "new_file.xlsx"))
    

    Output:

        A1.xlsx  A2.xlsx
    1  Distance Distance
    2       245     2453
    3       534     2452
    4       142     4434
    5      6342     3456
    6       634     6754
    7      4343     <NA>
    8      2323     <NA>
    9      1123     <NA>
    10      666     <NA>
    

    enter image description here