I've read through MonetDB.R/MonetDBLite posts on this site, the CRAN PDFs, as well as some of the documentation on the MonetDB site and confess to being at the bottom of what appears to be a steep learning curve.
The MonetDBLite documentation lists a handful of functions with the written caveat that they are not really meant to be called directly. When using a MonetDB.R function in R, a warning appears stating that it has been deprecated and please consider using MonetDBLite.
My data set is about 100 million rows x 60 factor columns currently occupying 30 data.frames that have been saved to rds files. I don't have enough memory to work with the whole of it in R, so I'd like to dump it all into MonetDB and extract only the data needed for specific tasks.
If I opt for a one table per data.frame paradigm, what R script will run a query on all tables in the MonetDB database? I assume this is some type of JOIN. Can this be done with CREATE VIEW?
If I opt for the one big table paradigm, what R script will allow me to append to an existing MonetDB table?
Any guidance would be appreciated. Here's what I've been able to cobble together so far:
library( MonetDB.R )
library( MonetDBLite )
library( DBI )
library( digest )
dbDir <- 'myDirectory'
#
# Create connection to database
#
conn <- dbConnect( MonetDBLite(), dbDir )
#
# Get data.table
#
f <- choose.files( default='directory to rds files',
caption="select files for processing:",
multi=FALSE )
dt <- readRDS( file=f )
# extract file name without extension
tblName <- sub( "^([^.]*).*", "\\1", basename( f ) )
#
# Add data.table to db
#
dbWriteTable( conn, tblName, dt )
#
# Count rows
#
qryStr <- paste( "SELECT COUNT(*) FROM", tblName )
dbGetQuery( conn, qryStr )
#
# After loading a few tables, stitch together into one view
#
viewStr <- "CREATE VIEW big_tbl AS
SELECT * FROM (
SELECT 'table1' AS type, table1.* FROM table1
UNION ALL
SELECT 'table2' AS type, table2.* FROM table2
UNION ALL
SELECT 'table3' AS type, table3.* FROM table3
) AS big_tbl_table"
dbGetQuery( conn, viewStr )
# dbListTables( conn )
#[1] "big_tbl" "table1"
#[3] "table2" "table3"
#
# Get VIEW into R
#
library( dplyr )
mdb <- src_monetdb( embedded=dbDir )
bigView <- tbl( mdb, 'big_tbl' ) # Error: select is not a character vector
bigView <- tbl( mdb, 'table1' )
#
# Disconnect from db
#
dbDisconnect( conn )
Do all the small tables have the same schema/columns? In that case I would put them all into one big table using dbWriteTable()
with the append
flag set to TRUE
.