rdplyrrjavadbplyrrjdbc

Save dplyr query to different schema in dbplyr


I have a JDBC connection and would like to query data from one schema and save to another

library(tidyverse)
library(dbplyr)
library(rJava)
library(RJDBC)

# access the temp table in the native schema
tbl(conn, "temp")

temp_ed <- temp %*% mutate(new = 1)

# I would like to save temp_ed to a new schema "schmema_new"

I would like to use something like dbplyr::compute() but define the output schema specifically. It seems dbplyr::copy_to could be used, but would require bringing the data through the local machine.

I want to use something like RJDBC::dbSendUpdate() but which would ideally integrate nicely with the data manipulating pipeline above.


Solution

  • I do this using dbExecute from the DBI package.

    The key idea is to extract the query that defines the current remote table and make this a sub-query in a larger SQL query that writes the table. This requires that (1) the schema exists, (2) you have permission to write new tables, and (3) you know the correct SQL syntax.

    Doing this directly might look like:

    tbl(conn, "temp")
    temp_ed <- temp %*% mutate(new = 1)
    
    save_table_query = paste(
        "SELECT * INTO my_database.schema_new.my_table FROM (\n",
        dbplyr::sql_render(temp_ed),
        "\n) AS sub_query"
      )
    
    dbExecute(conn, as.character(save_table_query))
    

    INTO is the clause for writing a new table in SQL server (the flavour of SQL I use). You will need to find the equivalent clause for your database.

    In practice I use a custom function that looks something like this:

    write_to_database <- function(input_tbl, db, schema, tbl_name){
      # connection
      tbl_connection <- input_tbl$src$con
    
      # SQL query
      sql_query <- glue::glue(
        "SELECT *\n",
        "INTO {db}.{schema}.{tbl_name}\n",
        "FROM (\n",
        dbplyr::sql_render(input_tbl),
        "\n) AS sub_query"
      )
    
      result <- dbExecute(tbl_connection, as.character(sql_query))
    }
    

    Applying this in your context:

    tbl(conn, "temp")
    temp_ed <- temp %*% mutate(new = 1)
    write_to_database(temp_ed, "my_database", "schema_new", "my_table")