rsql-serverdbidbplyr

Store new permanent table in schema using compute


I want to use dbplyr syntax to do some JOIN / FILTER operations on some tables and store the results back to the Database without collecting it first.

From what I read compute(..., temporary = FALSE, ...) should be doing that, however I struggle of how to provide the fully qualified name (that is database.schema.table_name) to the table where I want to store

I know about DBI::Id and dbplyr::in_schema but I do not know how use them properly. A try with sql did at least what I wanted (created the table) but resulted in a (spurios?) error.

What do I need to do?

Some NoReprex

library(DBI)
library(dbplyr)

con <- dbConnect(odbc::odbc(), "myserver")

## do __not__ collect the data
my_frame <- con %>%
   tbl(Id(catalog = "mydb", schema = "dbo", table = "mytable")) %>%
   inner_join(con %>% tbl(Id(catalog = "mydb", schema = "dbo", 
                             table = "yetanothertable")),
              "id")

compute(my_frame,
        # Id(catalog = "mydb", schema = "dbo", table = "mynewtable"), # (1)
        # in_schema("dbo", "mynewtable"),                             # (2),
        sql("mydb.dbo.mynewtable"),                                   # (3)
        FALSE)

I get different error depending on th variant I am using

# (1)
## Error in h(simpleError(msg, call)) : 
##   error in evaluating the argument 'conn' in selecting a method for function 
## 'dbQuoteIdentifier': argument "con" is missing, with no default

# (2)
## Error in escape(x$schema, con = con) : 
##   argument "con" is missing, with no default

# (3)
## Error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near ')'.  
##            [Microsoft][SQL Server Native Client 11.0][SQL Server]Statement(s) could not be prepared. 
## <SQL> 'SELECT *
## FROM (my.fully_qualified.name) "q02"
## WHERE (0 = 1)'

P.S.: I really want to be able to save the table with the fully qualified name, that is including the database name (though it is the same in this simplified example). So a dbConnect(..., database = <somedb>) won't solve my problem in the long run.

P.P.S: I am looking for a compute solution. I know I could construct the SQL myself, but I am really interested to see whether I can use the dbplyr abstraction layer for that.


Solution

  • I have previously preferred the solution that involves writing some SQL (As per this answer). But as you exclude this approach in your question, I tested and found a way to do it without writing SQL.

    We are going to use db_compute instead of compute.

    General setup

    library(DBI)
    library(dplyr)
    library(dbplyr)
    
    # connect to database
    connection_string = "..."
    db_connection = dbConnect(odbc::odbc(), .connection_string = connection_string)
    
    # remote table
    remote_table = tbl(db_connection, from = in_schema("schema","table"))
    top_rows = remote_table %>%
      head()
    

    Testing compute

    top_rows %>% show_query()
    # <SQL>
    # SELECT TOP (6) *
    # FROM [database_name].[schema_name].[table_name]
    
    top_rows = top_rows %>%
      compute()
    # Created a temporary table named: #dbplyr_002
    
    top_rows %>% show_query()
    # <SQL>
    # SELECT *
    # FROM #dbplyr_.002
    

    So we can see compute writes a temporary table. Hence if we did some complex processing (instead of just taking the top few rows) compute would be an effective way to storing the processed table so we can avoid repeating the complex processing every time we query it.

    But because it is temporary, the table should disappear when we disconnect from the database: DBI::dbDisconnect(db_connection).

    Testing db_compute

    out = db_compute(
      con = db_connection,
      table = in_schema("schema","new_table"),
      sql = sql_render(top_rows),
      temporary = FALSE
    )
    
    out
    # <IDENT> database_name.schema_name.new_table
    
    # reconnect
    new_remote_table = tbl(db_connection, from = in_schema("schema","new_table"))
    

    So we can now access the new (permanent) table from within R. I also checked and confirmed that the table exists in the database with an SQL query.

    Note that as db_compute has minimal documentation, it is not clear whether it is intended to be used in this way. I have tested the above and it worked. But without additional documentation, use at your own risk.