rcsvimportr-googlesheets

Read European formatted csv with googlesheets package


I have trouble finding ways to import files efficiently that are formatted European style (a , comma for decimals and . dot for thousands 12.300,35) from google sheets via the library(googlesheets) package. R will automatically read the commas as delimiters and not as decimal signs.

When downloading the sheet (via R), it automatically chooses a comma separator. Can I customize that?

Here's my data via a shared google sheets link.

My code:

library(googlesheets)

# Authenticate with your google sheets
sheets <- gs_ls()

# Import
spreadsheet <- gs_title("sample_data")

# Read dataset
sample <- gs_read(spreadsheet, ws = 1)

With the wrong results:

 jaren hbo_procent wo_procent
   <int>       <dbl>      <dbl>
 1  2006       66.0        9.00
 2  2007       67.0       97.0 
 3  2008        7.00     104   
 4  2009       73.0      112   
 5  2010       75.0      119   
 6  2011       77.0      128   
 7  2012       78.0      137   
 8  2013       76.0      142   
 9  2014       74.0      148   
10  2015       75.0      163   
11  2016       75.0      181   
12  2017       78.0      195   

Solution

  • googlesheets has an interface like readr and specifically it offers the locale argument. You can read more about it here: http://readr.tidyverse.org/articles/locales.html

    In this case, we need to specify the decimal mark.

    I made my own copy of OP's Sheet. I'm not sure the original is correctly published to the web, which is necessary to make something world readable via the Sheets v3 API. This is (confusingly) different from making a Sheet "public on the web".

    library(googlesheets)
    
    ## I made a copy of OP's original Sheet
    spreadsheet <- gs_title("Copy of sample_data")
    #> Sheet successfully identified: "Copy of sample_data"
    
    # Read dataset
    sample <- gs_read(
      spreadsheet,
      ws = 1,
      locale = readr::locale(decimal_mark = ",")
    )
    #> Accessing worksheet titled 'Blad1'.
    #> Parsed with column specification:
    #> cols(
    #>   jaren = col_double(),
    #>   hbo_procent = col_double(),
    #>   wo_procent = col_double()
    #> )
    sample
    #> # A tibble: 12 x 3
    #>    jaren hbo_procent wo_procent
    #>    <dbl>       <dbl>      <dbl>
    #>  1 2006.        6.60       9.00
    #>  2 2007.        6.70       9.70
    #>  3 2008.        7.00      10.4 
    #>  4 2009.        7.30      11.2 
    #>  5 2010.        7.50      11.9 
    #>  6 2011.        7.70      12.8 
    #>  7 2012.        7.80      13.7 
    #>  8 2013.        7.60      14.2 
    #>  9 2014.        7.40      14.8 
    #> 10 2015.        7.50      16.3 
    #> 11 2016.        7.50      18.1 
    #> 12 2017.        7.80      19.5
    

    Alternatively, to make a Sheet that is easier for more people to "just read", you could do File > Spreadsheet settings and choose, e.g., the United States as the locale for the Sheet itself.