rtidyr

How to determine which column caused an error


I have a data.frame with many list columns.

df <-
  tibble::tibble(
    a = list(c(1,2), c(3,4)),
    b = list(c(1, 2), 'a')
  ) 

I would like to unnest() all of them at once. However, one of them is problematic.

tidyr::unnest(df, everything())
#> Error in `list_unchop()`:
#> ! Can't combine `x[[1]]` <double> and `x[[2]]` <character>.
#> Backtrace:
#>      x
#>   1. +-tidyr::unnest(df, everything())
#>   2. +-tidyr:::unnest.data.frame(df, everything())
#>   3. | \-tidyr::unchop(...)
#>   4. |   \-tidyr:::df_unchop(...)
#>   5. |     \-vctrs::list_unchop(col, ptype = col_ptype)
#>   6. \-vctrs (local) `<fn>`()
#>   7.   \-vctrs::vec_default_ptype2(...)
#>   8.     +-base::withRestarts(...)
#>   9.     | \-base (local) withOneRestart(expr, restarts[[1L]])
#>  10.     |   \-base (local) doWithOneRestart(return(expr), restart)
#>  11.     \-vctrs::stop_incompatible_type(...)
#>  12.       \-vctrs:::stop_incompatible(...)
#>  13.         \-vctrs:::stop_vctrs(...)
#>  14.           \-rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)

How can I determine which column caused the error without repeatedly removing one and re-rerunning the code? Is it somewhere in the backtrace?

Created on 2024-06-04 with reprex v2.0.2


Solution

  • The question really being asked is how to identify which list columns contain incompatible types, where other answers so far identify which list columns contain different types. These are obviously separate questions.

    Information on the rules for vector compatibility of the fundamental vectors in the tidyverse context can be found in help("faq-compatibility-types", package = vctrs) which gives the following:

    Two vectors are compatible when you can safely:

    • Combine them into one larger vector.

    • Assign values from one of the vectors into the other vector.

    In general, the common type is the richer type, in other words the type that can represent the most values. Logical vectors are at the bottom of the hierarchy of numeric types because they can only represent two values (not counting missing values). Then come integer vectors, and then doubles. Here is the vctrs type hierarchy for the fundamental vectors:

    enter image description here

    Even more detail is provided in help("theory-faq-coercion", package = vctrs).

    The function vctrs::vec_ptype_common() identifies whether a set of vectors can be cast to a common type by returning the common type or, if not, an error. By wrapping this function with purrr::possibly() to return NULL instead of an error, list columns can be checked to see if they contain incompatible types.

    library(vctrs)
    library(purrr)
    
    incompatible <- function(df) {
      map_lgl(df, \(x) if (is.list(x)) is.null(do.call(possibly(vec_ptype_common, otherwise = NULL), x)) else FALSE)
    }
    
    incompatible(df)
    
        a     b 
    FALSE  TRUE