I want to use dbplyr
syntax to do some JOIN
/ FILTER
operations on some tables and store the results back to the Database without collecting it first.
From what I read compute(..., temporary = FALSE, ...)
should be doing that, however I struggle of how to provide the fully qualified name (that is database.schema.table_name
) to the table where I want to store
I know about DBI::Id
and dbplyr::in_schema
but I do not know how use them properly. A try with sql
did at least what I wanted (created the table) but resulted in a (spurios?) error.
What do I need to do?
library(DBI)
library(dbplyr)
con <- dbConnect(odbc::odbc(), "myserver")
## do __not__ collect the data
my_frame <- con %>%
tbl(Id(catalog = "mydb", schema = "dbo", table = "mytable")) %>%
inner_join(con %>% tbl(Id(catalog = "mydb", schema = "dbo",
table = "yetanothertable")),
"id")
compute(my_frame,
# Id(catalog = "mydb", schema = "dbo", table = "mynewtable"), # (1)
# in_schema("dbo", "mynewtable"), # (2),
sql("mydb.dbo.mynewtable"), # (3)
FALSE)
I get different error depending on th variant I am using
# (1)
## Error in h(simpleError(msg, call)) :
## error in evaluating the argument 'conn' in selecting a method for function
## 'dbQuoteIdentifier': argument "con" is missing, with no default
# (2)
## Error in escape(x$schema, con = con) :
## argument "con" is missing, with no default
# (3)
## Error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near ')'.
## [Microsoft][SQL Server Native Client 11.0][SQL Server]Statement(s) could not be prepared.
## <SQL> 'SELECT *
## FROM (my.fully_qualified.name) "q02"
## WHERE (0 = 1)'
P.S.: I really want to be able to save the table with the fully qualified name, that is including the database name (though it is the same in this simplified example). So a dbConnect(..., database = <somedb>)
won't solve my problem in the long run.
P.P.S: I am looking for a compute
solution. I know I could construct the SQL
myself, but I am really interested to see whether I can use the dbplyr
abstraction layer for that.
I have previously preferred the solution that involves writing some SQL (As per this answer). But as you exclude this approach in your question, I tested and found a way to do it without writing SQL.
We are going to use db_compute
instead of compute
.
compute
states that "compute() stores results in a remote temporary table". So I took this to mean we can not write permanent tables using compute
.db_compute
says very little. But it appears alongside db_copy_to
which has a purpose similar to what we are looking for. So it was worth trying (and it works).General setup
library(DBI)
library(dplyr)
library(dbplyr)
# connect to database
connection_string = "..."
db_connection = dbConnect(odbc::odbc(), .connection_string = connection_string)
# remote table
remote_table = tbl(db_connection, from = in_schema("schema","table"))
top_rows = remote_table %>%
head()
Testing compute
top_rows %>% show_query()
# <SQL>
# SELECT TOP (6) *
# FROM [database_name].[schema_name].[table_name]
top_rows = top_rows %>%
compute()
# Created a temporary table named: #dbplyr_002
top_rows %>% show_query()
# <SQL>
# SELECT *
# FROM #dbplyr_.002
So we can see compute
writes a temporary table. Hence if we did some complex processing (instead of just taking the top few rows) compute
would be an effective way to storing the processed table so we can avoid repeating the complex processing every time we query it.
But because it is temporary, the table should disappear when we disconnect from the database: DBI::dbDisconnect(db_connection)
.
Testing db_compute
out = db_compute(
con = db_connection,
table = in_schema("schema","new_table"),
sql = sql_render(top_rows),
temporary = FALSE
)
out
# <IDENT> database_name.schema_name.new_table
# reconnect
new_remote_table = tbl(db_connection, from = in_schema("schema","new_table"))
So we can now access the new (permanent) table from within R. I also checked and confirmed that the table exists in the database with an SQL query.
Note that as db_compute
has minimal documentation, it is not clear whether it is intended to be used in this way. I have tested the above and it worked. But without additional documentation, use at your own risk.