rodbcdbidbplyrroracle

Include Database Connection in my R package


Every day, I connect to a corporate Database from within R. Currently, I have an R-script which sets up the connection and undertakes all pre-analysis data preparation (mutates and joins). I would like to include this in an R-package DWH, so I can share this process with colleagues.

Here is some sample code:

creds <- read.csv2('C:/creds.csv')

con <- ROracle::dbConnect(drv = DBI::dbDriver("Oracle"),
                               username = creds$user, password= creds$pwd,
                               dbname = 'DWH',
                               prefetch = FALSE, bulk_read = 1000L, stmt_cache = 0L,
                               external_credentials = FALSE, sysdba = FALSE)

PREST <- dplyr::tbl(con, dbplyr::in_schema('ADWH', 'PREST')) %>%
    transmute(date = A_DWH_DATE_TODAY,
              amount = A_DWH_POL_IND_COLL_AMOUNT)

Ideally, I would like to load the package with library(DWH), and then the PREST object should be available to everyone for further use, e.g.

library(DWH)
PREST %>% filter(date > 201912) 

This works perfectly well while sourcing the R-script. How can I implement this inside an R-package? I am using ROracle, but this should be the same process with any other odbc database connection.


Solution

  • While I agree with @r2evans that this is not a good design decision, if you are committed to doing it this way an approach you could take is a function that assigns variables to the global environment (as per this question).

    This would look something like:

    create_globals = function(){
        creds <- read.csv2('C:/creds.csv')
    
        con <- ROracle::dbConnect(drv = DBI::dbDriver("Oracle"),
                                       username = creds$user, password= creds$pwd,
                                       dbname = 'DWH',
                                       prefetch = FALSE, bulk_read = 1000L, stmt_cache = 0L,
                                       external_credentials = FALSE, sysdba = FALSE)
    
        PREST <- dplyr::tbl(con, dbplyr::in_schema('ADWH', 'PREST')) %>%
            transmute(date = A_DWH_DATE_TODAY,
                      amount = A_DWH_POL_IND_COLL_AMOUNT)
    
        assign("creds", cred, envir = .GlobalEnv)
        assign("con", con, envir = .GlobalEnv)
        assign("PREST", PREST, envir = .GlobalEnv)
    }
    

    Then your package use looks like:

    library(DWH)
    create_globals()
    PREST %>% filter(date > 201912)