rdplyrhash

Error when using bind_rows where one data frame has a column of type <hash>


I want to bind rows from two different data frames where one data frame contains a column of hash values whereas the second data frame doesn't contain this column.

library(tidyverse)
library(openssl)

df <- data.frame(x = sha3(letters[1:3], size = 512),
                 y = 1:3)

df2 <- data.frame(y = 4:6)

df |>
  bind_rows(df2)

When trying to bind rows, I get the following error:

Error in `bind_rows()`:
! Can't combine `..1` <hash> and `..2` <vctrs:::common_class_fallback>.
Run `rlang::last_trace()` to see where the error occurred.

I somehow get where this is coming from, because the second data frame doesn't contain the x-column. However, my expected/desired output would the that bind_rows would still work and just fill the x-column with NA's for df2.

EDIT:

I created a small workaround by simply converting the hash column from df back to character, but am still curious if I could prevent the error in the first place.


Solution

  • The issue is not specific to hashes. It occurs if you have an object for which no c method is defined.

    bind_rows uses vctrs::vec_rbind and the documentation says:

    If columns to combine inherit from a common class, vec_rbind() falls back to base::c() if there exists a c() method implemented for this class hierarchy.

    This isn't the exact case here (no common class) but we see <vctrs:::common_class_fallback> in the error message. If we define a simple c.hash it works somewhat. Unfortunately, it doesn't preserve the class. I don't know why.

    library(openssl)
    library(dplyr)
    
    c.hash <- function(...) {
      args <- list(...)
      class <- oldClass(args[[1]])
      dargs <- lapply(args, unclass)
      res <- do.call(c, dargs)
      oldClass(res) <- class
      res
    }
    
    x <- sha3(letters[1:3], size = 512)
    class(c(x, NA))
    #[1] "hash"     "sha3-512"
    
    
    df <- data.frame(x = sha3(letters[1:3], size = 512),
                     y = 1:3)
    
    df2 <- data.frame(y = 4:6)
    
    test <- df |>
      bind_rows(df2)
    class(test$x)
    #[1] "character"