rgoogle-cloud-ml

Reading csv file in R from Google Storage Bucket with cloudml or googleCloudStorageR


I want to read csv file from a Google Storage bucket.

With googleCloudStorageR library :

 bucket_name  <- "xxxxx"
  gfs_tmp_file <- "xxx.csv"
  # Set bucket default 
  googleCloudStorageR::gcs_global_bucket(bucket_name)
  gfs_file <- googleCloudStorageR::gcs_get_object(gfs_file) 

But here gfs_file contains raw data and I don't know how to migrate to a data.frame R

√ Downloaded and parsed gfs_data_temp.csv into R object of class: raw
   [1] 2c 44 41 54 5f 52 55 4e 2c 44 41 54 5f 46 4f 52 45 43 41 53 54 2c 4c 49 42 5f 53 4f 55 52 43 45 2c 4d 45 53 5f 4c
  [39] 4f 4e 47 49 54 55 44 45 2c 4d 45 53 5f 4c 41 54 49 54 55 44 45 2c 4d 45 53 5f 54 45 4d 50 45 52 41 54 55 52 45 2c
  [77] 4d 45 53 5f 48 55 4d 49 44 49 54 45 2c 4d 45 53 5f 50 4c 55 49 45 2c 4d 45 53 5f 56 49 54 45 53 53 45 5f 56 45 4e
  1. With cloudml library, it seems more easely :

No tested :

library(cloudml)
data_dir <- gs_data_dir("gs://{bucket_name}")
gfs_file <- file.path(data_dir, gfs_file)
mtcars_dataset <- csv_dataset(gfs_file) 

So what is the best method to download file from GC bucket and store it in a data.frame R?


Solution

  • Using googleCloudStorageR library brings raw data from the file that you read. What you can do is to insert the raw data into the dataframe as:

    data_frame <- data.frame( column_name1 = vector1, column_name2 = vector2 )

    Where:

    You can see here more information.

    Additionally, the cloudml library doesn’t mention how it brings you the data, so you should try it to see if it returns the data as you want, or you need to insert the data manually to the data frame.