rtibble

How to return the col type of a R tibble in compact string representation?


For example I have a tibble like this. test <- tibble(a = 10, b = "a")

with this input, I want a function that can return "dc" which represent double and character.

The reason I ask this is that I want to read in lots of files. and I don't want to let read_table function to decide the type for each columns. I can specific the string manually, but since the actually data I want to import have 50 columns, it is quite hard to do manually.

Thanks.


Solution

  • While the aforementioned test %>% summarise_all(class) will give you the class names of the columns it does so in a long form, whereas in this problem you to convert them to single character codes that mean something to read_table col_types. To map from class names to single letter codes you can use a lookup table, here's an (incomplete) example with dput:

    structure(list(col_type = c("character", "integer", "numeric", 
    "double", "logical"), code = c("c", "i", "n", "d", "l")), .Names = c("col_type", 
    "code"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
    -5L))
    

    Now using this table, I'll call it types, we can finally transform the column types in a single string:

    library(dplyr)
    library(tidyr)
    library(stringr)
    
    test %>% 
      summarise_all(class) %>% 
      gather(col_name, col_type) %>% 
      left_join(types) %>% 
      summarise(col_types = str_c(code, collapse = "")) %>% 
      unlist(use.names = FALSE)
    

    This gets the class for each column (summarise_all) then gathers them into a tibble matching the column name with the column type (gather). The left_join matches on the col_type column and gives the short 1-char code for each column name. Now we don't do anything with the column names, so it's fine to just concatenate with a summarise and str_c. Finally unlist pulls the string out of a tibble.