rgoogle-sheetstidyversegooglesheets4

Importing date columns as lists from googledoc via googlesheets4 in R


I am trying to read this googledoc via googlesheets4.

The file contains 3 columns:

  1. id, as character
  2. start_date / end_date: this columns contain both dates (in the format 2020-01-31) and text strings (namely strings being in_stock and in_use)

I made this googlesheet available to read with anybody with the link.

I read the dataframes as follows:

suppressPackageStartupMessages({
  library(tidyverse)
  library(lubridate)
  library(readxl) 
  library(googlesheets4)})

df <- read_sheet('1rS_Zw4Qc8S4oV5-u_YZJACNBqu3VDQUMf9N_dGrLUo8',sheet = 'test_import')

While it seems initially fine, when I check how it was imported, here's the result:

> head(df)
# A tibble: 6 x 3
     id start_date end_date 
  <dbl> <list>     <list>   
1     1 <chr [1]>  <chr [1]>
2     2 <chr [1]>  <chr [1]>
3     3 <chr [1]>  <chr [1]>
4     4 <chr [1]>  <chr [1]>
5     5 <chr [1]>  <chr [1]>
6     6 <chr [1]>  <chr [1]>

While RStudio shows correctly the dataframe in the preview, the columns start_date and end_date are loaded as lists (which never happened to me).

When I try to convert the columns into dates by doing:

df %>%
  mutate(start_date = as.Date(start_date))

I get the error:

Error in as.Date.default(start_date) : 
  do not know how to convert 'start_date' to class “Date”

How can I correctly import this while preserving start_date and end_date as dates columns in order to perform operations with them?

Thanks for the help.

    > sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.5

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] googlesheets4_0.2.0.9000 readxl_1.3.1             lubridate_1.7.4          forcats_0.4.0            stringr_1.4.0            dplyr_0.8.3             
 [7] purrr_0.3.4              readr_1.3.1              tidyr_1.0.0              tibble_3.0.1             ggplot2_3.2.1            tidyverse_1.2.1         

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2        cellranger_1.1.0  pillar_1.4.4      compiler_3.6.1    tools_3.6.1       packrat_0.5.0     googledrive_1.0.1 jsonlite_1.6.1   
 [9] lifecycle_0.2.0   nlme_3.1-141      gtable_0.3.0      lattice_0.20-38   pkgconfig_2.0.3   rlang_0.4.6       cli_2.0.2         rstudioapi_0.10  
[17] haven_2.1.1       withr_2.2.0       xml2_1.2.2        httr_1.4.1        generics_0.0.2    vctrs_0.3.1       hms_0.5.1         grid_3.6.1       
[25] tidyselect_0.2.5  glue_1.4.1        R6_2.4.1          fansi_0.4.1       modelr_0.1.5      magrittr_1.5      backports_1.1.4   scales_1.0.0     
[33] ellipsis_0.3.1    rvest_0.3.4       assertthat_0.2.1  colorspace_1.4-1  stringi_1.4.6     lazyeval_0.2.2    munsell_0.5.0     broom_0.5.2      
[41] crayon_1.3.4  

Solution

  • try to do it this way (dplyr 1.0.0). You have the text in_stock and in_use in the date column. Therefore, an error appears.

    library(tidyverse)
    df <- read_sheet('1rS_Zw4Qc8S4oV5-u_YZJACNBqu3VDQUMf9N_dGrLUo8',sheet = 'test_import', col_types = "c")
    df %>% 
      mutate(across(ends_with("_date"), ymd)) 
    

    or old dplyr

    df %>% 
      mutate_at(vars(ends_with("_date")), ymd)