rdplyrgoogle-bigquerydbplyrbigrquery

How to write virtual BQ table back to BQ using R DBI and bigrquery?


I want to be able to

  1. Access a BQ table. This is class
[1] "tbl_BigQueryConnection" "tbl_dbi"                "tbl_sql"               
[4] "tbl_lazy"               "tbl"   `
  1. Alter the table using dbplyr to create a new table. Again, has class
[1] "tbl_BigQueryConnection" "tbl_dbi"                "tbl_sql"               
[4] "tbl_lazy"               "tbl"   
  1. Write this new table to BQ.

I get the following error:

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘dbWriteTable’ for signature ‘"BigQueryConnection", "character", "tbl_BigQueryConnection"’

MRE

library(DBI)
library(dplyr, warn.conflicts = FALSE)
library(bigrquery)

############  CREATE BQ TABLE TO ACCESS  #################
dataset = bq_dataset(bq_test_project(), "test_dataset")

if (bq_dataset_exists(dataset))
{
  bq_dataset_delete(dataset, delete_contents = T)
}
#> Suitable tokens found in the cache, associated with these emails:
#>   * ariel.balter@gmail.com
#>   * ariel.balter@providence.org
#> The first will be used.
#> Using an auto-discovered, cached token.
#> To suppress this message, modify your code or options to clearly consent to the use of a cached token.
#> See gargle's "Non-interactive auth" vignette for more details:
#> https://gargle.r-lib.org/articles/non-interactive-auth.html
#> The bigrquery package is using a cached token for ariel.balter@gmail.com.

bq_dataset_create(dataset)
#> <bq_dataset> elite-magpie-257717.test_dataset

conn = DBI::dbConnect(
  bigrquery::bigquery(),
  project = bq_test_project(),
  dataset = "test_dataset",
  KeyFilePath = "google_service_key.json",
  OAuthMechanism = 0
)


if (dbExistsTable(conn, "mtcars"))
{
  dbRemoveTable(conn, "mtcars")
}

dbWriteTable(conn, "mtcars", mtcars)

#######################################################


### Access BQ table
mtcars_tbl = tbl(conn, "mtcars")
class(mtcars_tbl)
#> [1] "tbl_BigQueryConnection" "tbl_dbi"                "tbl_sql"               
#> [4] "tbl_lazy"               "tbl"

### Create new virtual table
hp_gt_100_tbl = mtcars_tbl %>% filter(hp>100)
class(hp_gt_100_tbl)
#> [1] "tbl_BigQueryConnection" "tbl_dbi"                "tbl_sql"               
#> [4] "tbl_lazy"               "tbl"

### Write new table
dbWriteTable(conn, "hp_gt_100", hp_gt_100_tbl)
#> Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'dbWriteTable' for signature '"BigQueryConnection", "character", "tbl_BigQueryConnection"'

dbExecute(conn, "DROP TABLE mtcars")
#> [1] 0
dbExecute(conn, "DROP TABLE hp_gt_100")
#> Error: Job 'elite-magpie-257717.job_O8e7BtdfAnAb_8Vdtwybibgd7DpA.US' failed
#> x Not found: Table elite-magpie-257717:test_dataset.hp_gt_100 [notFound]

Created on 2020-11-11 by the reprex package (v0.3.0)


Solution

  • I don't think you can do this with dbWriteTable using your current approach. dbWriteTable "writes, overwrites or appends a [local] data frame to a database table" (source).

    So one option is to collect that data into R and them write it back out to SQL using dbWriteTable. But this is likely to be inefficient.

    The approach I would recommend is creating a bigquery INSERT INTO statement and passing this to dbExecute. Something like the following:

    sql_query <- glue::glue("INSERT INTO {db}.{schema}.{tbl_name}\n",
                             dbplyr::sql_render(input_tbl))
    
    result <- dbExecute(db_connection, as.character(sql_query))
    

    sql_render will take the definition of your current virtual table and return the text of the query. dbExecute will pass this command to the bigquery server to be executed.

    Please note, I am not familiar enough with the INSERT INTO syntax for bigquery to ensure the syntax of sql_query above is correct, but I know the general approach works as I use dbplyr and DBI extensively in SQL server.