roracle-databasedplyrroracledbplyr

How to use dbplyr to store a query in an Oracle temporary table without ever storing the data locally?


I have a data set that I'm querying which is too large to store on my personal computer. I want to use dbplyr to store the data in a temporary table, but I'm running into problems trying to do this.

Here is my code:

library("ROracle")
library("dplyr")

## Let dbplyr know that we're in an Oracle environment
sql_translate_env.OraConnection <- dbplyr:::sql_translate_env.Oracle
sql_select.OraConnection <- dbplyr:::sql_select.Oracle
sql_subquery.OraConnection <- dbplyr:::sql_subquery.Oracle

## Connect to database
drv = dbDriver("Oracle")
host = "myHost"
port = 1521
sid = "myServerName"
connect.string = paste(
  "(DESCRIPTION=",
  "(ADDRESS=(PROTOCOL=tcp)(HOST=", host, ")(PORT=", port, "))",
  "(CONNECT_DATA=(SID=", sid, ")))", sep = "")
con = dbConnect(drv, username="myUsername", password="myPassword", dbname=connect.string)

## Create the query
remote = tbl(con, "myTable") %>%
  filter(GRP_ID == 12345)

## Attempt to run the query and store it in a temporary table, but this fails!
compute(remote)

This code gives me the following error message:

Error in .oci.SendQuery(conn, statement, data = data, prefetch = prefetch,  : 
  ORA-14459: missing GLOBAL keyword

Has anyone been able to accomplish this task with an Oracle database and dbplyr and, if so, could you please share how you got it to work?


Solution

  • the latest version of dbplyr automatically recognizes ROracle connections and re-routes the it's translations to our Oracle translation, so you don't need to use the workaround at the top of your code any longer after you upgrade. Also, the copy_to() command will be what you can use to cache the results of a query with out the need to download the data. The copy_to() command is smart enough to know when it's being passed a data frame to upload, or a SQL statement it needs to cache.