rdataframetypes

How can I set column types of several columns in empty R data frame?


I'm trying to create an empty dataframe with specific column names and column types. What I have is a function that receives a list with this structure:

list$
    $name_1$
           $class
           $more_stuff
    $name_2$
           $class
           $more_stuff
    ...

So, I create an empty dataframe:

df <- data.frame(matrix(ncol = length(names(my_list)), nrow = 0))
colnames(df) <- names(my_list)
# set column types

Every item of the list has a 'class' attribute which contains the data type. These can be: "character", "numeric", "datetime", "boolean", and they must be converted into:

"character" -> character
"numeric"   -> numeric
"datetime"  -> datetime<UTC>
"boolean"   -> character

Because this dataframe will be joined to another one that has these columns.

I tried something like this, but the datetime column was incorrect:

  for (i in c(1:length(my_list))) {
    cast_function = switch(my_list[[i]]$class,
                           'character' = as.character,
                           'numeric'   = as.numeric,
                           'datetime'  = as.POSIXct,
                           'boolean'   = as.character)

    empty_df[[i]] <- cast_function(empty_df[[i]])
  }

How could I perform this operation? Is there any better way?

Thanks in advance.


Solution

  • I think this snippet contains all the elements you might need:

    lst <- list(
      x = list("character", letters[1:5]),
      y = list("numeric", as.double(1:5)),
      w = list("boolean", as.character(as.integer(c(
        T, F, F
      )))),
      z = list("datetime", Sys.time())
    )
    
    my_classes <- unlist(lapply(lst, function(x) x[1]))
    
    mapping <- list(
      character = character(),
      numeric = numeric(),
      datetime = structure(double(), class = c("POSIXct", "POSIXt")),
      boolean = character()
    )
    
    do.call(tibble::tibble, args = mapping[my_classes])
    #> # A tibble: 0 × 4
    #> # … with 4 variables: character <chr>, numeric <dbl>, boolean <chr>,
    #> #   datetime <dttm>