rdplyrdbplyr

Dbplyr: combine two tables and add the to the result to the database without loading them in memory


Please have a look at the simple script at the end of the post. I have a database containing two tables which I combine using union_all. Is there a way to add the result to the database without collecting the data i.e. loading them into memory? Many thanks!




library(tidyverse)
library(DBI) # main DB interface
library(dbplyr) # dplyr back-end for DBs
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql
library(RSQLite)


##create the databases

df1 <- tibble(x=1:20,y=rep(c("a", "b"), 10))


df2 <- tibble(x=101:120,y=rep(c("d", "e"), 10))




con <- dbConnect(drv=RSQLite::SQLite(), dbname="db.sqlite")

dbWriteTable(con,"mydata1",df1, overwrite=T)
dbWriteTable(con,"mydata2",df2, overwrite=T)

dbDisconnect(con) # closes our DB connection


con <- dbConnect(drv=RSQLite::SQLite(), dbname="db.sqlite")

mydb1 <- tbl(con, "mydata1")
mydb2 <- tbl(con, "mydata2")


mydb12 <- union_all(mydb1,mydb2)

#is there a way to add the union of mydb1 and mydb2 to the database without explicitly collecting the data?

Created on 2020-12-24 by the reprex package (v0.3.0)


Solution

  • Since you're dealing with SQL, just use SQL.

    collect(mydb1) %>%
      nrow()
    # [1] 20
    DBI::dbExecute(con, "insert into mydata1 select * from mydata2")
    # [1] 20
    collect(mydb1) %>%
      nrow()
    # [1] 40
    collect(mydb1) %>%
      tail()
    # # A tibble: 6 x 2
    #       x y    
    #   <int> <chr>
    # 1   115 d    
    # 2   116 e    
    # 3   117 d    
    # 4   118 e    
    # 5   119 d    
    # 6   120 e    
    

    If you want the combined data in a new table, then here's an alternative.

    DBI::dbExecute(con, "
      create table mydata12 as
        select * from mydata2 union all select * from mydata1")