rdplyrerror-handlingpurrrnested-lists

Error handling for tidyr hoist in API call dplyr pipe when column type changes between calls


I'm working with the Jira Native API to bring in dashboard data to R for rectangularizing. The max results that can be returned each call is 100. I'm working with a small dashboard atm but am thinking about how to solve problems that may arise in the future. If I work with a max results of 25, I will get changes in columns as sometimes they will be completely empty, and sometimes they will have a nested list for some of the values.

I was grabbing values from the nested lists with tidyr::hoist, but that will error if the call doesn't have any nested list in the column.

library(tidyr)
library(dplyr)

# Create sample sets for question
sample_one <- structure(
  list(
    id = c("17454", "17452", "17396"),
    Resolution = list(
      NULL,
      NULL,
      list(
        self = "task_set",
        id = "10000",
        description = "Work has been completed on this issue.",
        name = "Done"
      )
    ),
    Comms = list(NA, NA, list(self = "task_set", value = "Yes"))
  ),
  row.names = c(NA, -3L),
  class = c("tbl_df", "tbl", "data.frame")
)

sample_two <- structure(
  list(
    id = c("17454", "17452", "17396"),
    Resolution = list(
      NULL,
      NULL,
      list(
        self = "task_set",
        id = "10000",
        description = "Work has been completed on this issue.",
        name = "Done"
      )
    ),
    Comms = c(NA, NA, NA)
  ),
  row.names = c(NA, -3L),
  class = c("tbl_df", "tbl", "data.frame")
)

These two sets are the same except the Comms column is in the first one a list column and the second one just NA values.

output_one <- sample_one |>
  tidyr::hoist(Resolution, Resolution_name = "name") |>
  tidyr::hoist(Comms, Comms_Needed = "value")

output_one

This yields the expected output of

> output_one
# A tibble: 3 × 5
  id    Resolution_name Resolution       Comms_Needed Comms           
  <chr> <chr>           <list>           <chr>        <list>          
1 17454 NA              <NULL>           NA           <lgl [1]>       
2 17452 NA              <NULL>           NA           <lgl [1]>       
3 17396 Done            <named list [3]> Yes          <named list [1]>
output_two <- sample_two |>
  tidyr::hoist(Resolution, Resolution_name = "name") |>
  tidyr::hoist(Comms, Comms_Needed = "value")

output_two

However in the output_two scenario the function errors out.

I've been looking a bit into error handling and currently am experimenting with purrr::possibly.

safe_hoist <- purrr::possibly(tidyr::hoist, otherwise = NA)

output_one <- sample_one |>
  safe_hoist(Resolution, Resolution_name = "name") |>
  safe_hoist(Comms, Comms_Needed = "value")

While this works fine in the first case, for the second case it swaps the entire dataframe for NA.

I'm curious if there is something better I can put in the otherwise argument to just return the Comms column as all NAs, or a better way to handle this kind of situation.

My preference would be to keep everything in a dplyr pipe, but at the end of the day, as long as it still works in the while loop that I am using to send requests to the Jira API then its fine.


Solution

  • You won't be able to use possibly() for this as the otherwise value is static and set in the environment of the function when it is created so it will as you discovered either return a data frame with the hoisted columns or return a single NA value. You could use tryCatch() directly but instead of trying to catch the error it is likely better to write the function to behave conditionally. I'm not sure if you meant "return the Comms column as all NA" in which case you can just return the data as it can already be NA or if you want to create a Comms_Needed column as all NA. I've assumed the latter.

    library(dplyr)
    library(tidyr)
    
    safe_hoist <- function(.data, .col, ...) {
      .col <- tidyselect::vars_pull(names(.data), {{ .col }} )
      if (is.list(.data[[.col]])) {
        hoist(.data, .col, ...)
      } else {
        dot_args <- list(...)
        dot_args <- dot_args[setdiff(names(dot_args), names(formals(hoist)))]
        mutate(.data, !!!replace(dot_args, TRUE, NA))
      }
    }
    
    
    sample_one |>
      safe_hoist(Resolution, Resolution_name = "name") |>
      safe_hoist(Comms, Comms_Needed = "value")
    
    # A tibble: 3 × 5
      id    Resolution_name Resolution       Comms_Needed Comms           
      <chr> <chr>           <list>           <chr>        <list>          
    1 17454 NA              <NULL>           NA           <lgl [1]>       
    2 17452 NA              <NULL>           NA           <lgl [1]>       
    3 17396 Done            <named list [3]> Yes          <named list [1]>
    
    sample_two |>
      safe_hoist(Resolution, Resolution_name = "name") |>
      safe_hoist(Comms, Comms_Needed = "value")
    
    # A tibble: 3 × 5
      id    Resolution_name Resolution       Comms Comms_Needed
      <chr> <chr>           <list>           <lgl> <lgl>       
    1 17454 NA              <NULL>           NA    NA          
    2 17452 NA              <NULL>           NA    NA          
    3 17396 Done            <named list [3]> NA    NA