rr-rio

Importing multiple excel sheets into one dataframe adding the sheet name as variable


I am importing multiple excel sheets into one dataframe using the rio package.

WIOD_EA_EmRelEnergy <- import_list("EA 2016/Emission-relevant Energy Accounts_total.xlsx",
                                   setclass = "tbl", rbind = TRUE)

This line of code does already exactly what I want. It adds a column at the end called "file" indicating the the number of the sheet (data from the first sheet takes the value 1 and so on).

However, I am trying that column to take the name of the sheet rather than a number. The names of the sheets are country codes ("AUS", "AUT", etc.). Thus, the data of the first sheet should not take the value 1 but rather "AUS".

This should only be a small problem but i simply do not find the solution.


Solution

  • Simply use bind_rows() in dplyr and set the arg .id = "sheet", then data in each sheet will be row-bind together and a new column named what you set in .id is added to record the sheet names which the data come from.

    dplyr::bind_rows(
      import_list("path/to/file/test.xlsx", setclass = "tbl"),
      .id = "sheet"
    )
    

    Test

    Write out an excel file with 2 sheets named AUS and AUT:

    openxlsx::write.xlsx(
      list(AUS = data.frame(x = 1:2, y = 3:4),
           AUT = data.frame(x = 5:6, y = 7:8)),
      file = "test.xlsx"
    )
    

    Then

    dplyr::bind_rows(
      rio::import_list("test.xlsx", setclass = "tbl"),
      .id = "sheet"
    )
    
    # # A tibble: 4 × 3                                                                 
    #   sheet     x     y
    #   <chr> <dbl> <dbl>
    # 1 AUS       1     3
    # 2 AUS       2     4
    # 3 AUT       5     7
    # 4 AUT       6     8