rxlsx

How to unread hide Excel sheet in R (read_excel)?


Would like to know is there any method or techniques which can ignore the hidden excel_sheet in an Excel workbook (read_excel).

Why I am asking is, Have many Excel files in that many sheets are hidden.

What I have tried so far is referred below link.

How to ignore hidden data when importing from Excel


Solution

  • Here's a function that will inform whatever function you want to use, whether it's readxl::read_excel or otherwise. (I've tested this with a couple of .xlsx files, but nothing extensive.)

    This requires the xml2 package.

    xlsx_sheet_info <- function(filename) {
      stopifnot(file.exists(filename))
      tmpdir <- tempfile(pattern = "xldir")
      fn <- try(
        utils::unzip(filename, files = "xl/workbook.xml", exdir = tmpdir, junkpaths = TRUE),
        silent = TRUE)
      if (inherits(fn, "try-error")) stop("unable to find 'xl/workbook.xml'")
      on.exit({
        # clean up our temporary directory used to extract the file
        if (dir.exists(tmpdir)) {
          suppressWarnings(unlink(tmpdir, recursive = TRUE, force = TRUE))
        }
      })
      xml <- try(xml2::read_xml(fn), silent = TRUE)
      if (inherits(xml, "try-error")) stop("unable to parse xml")
      # everything we want in sheets is stored as element attributes
      sheets <- lapply(xml2::as_list(xml)$workbook$sheets, attributes)
      nms <- unique(unlist(lapply(sheets, names)))
      out <- do.call(rbind.data.frame, lapply(sheets, function(sh) {
        # I believe attributes will always be 'character'
        sh[setdiff(nms, names(sh))] <- NA_character_
        # return them in-order, safe for 'rbind.data.frame'
        sh[nms]
      }))
      out[] <- lapply(out, type.convert, as.is = TRUE)
      out
    }
    

    I created a simple workbook with three worksheets, with various levels of "hidden"-ness: xlSheetVisible, xlSheetHidden, and xlSheetVeryHidden (per Excel VBA enums).

    info <- xlsx_sheet_info("Book1.xlsx")
    info
    #          name sheetId   id      state
    # sheet  Sheet1       1 rId1       <NA>
    # sheet1 Sheet3       3 rId2 veryHidden
    # sheet2 Sheet2       2 rId3     hidden
    

    This is a data.frame, so you can use is.na(info$state) (or grep for "hidden") to choose specific rows, either by name or sheetId (I assume these are monotonic integers, compatible with readxl::read_xlsx and friends).

    I would not assume that the order of the sheets is preserved in the rows, as is suggested here. In fact, I created the sheets in order: visible, hidden, very-hidden.