rodbc

R SQLove for ODBC Connection for Complex Queries


I am trying to get the results of a complex query into r for manipulation/ETL. I found SQLove and it sounds like it is exactly the package that would make my life easy. Trying to test it on a simple script-

if (!require("pacman")
) install.packages("pacman")

pacman::p_load(
  #add list of libraries here
  SQLove,
  odbc,
  RJDBC
)

con <- 
  dbConnect(
    odbc::odbc(),
    driver = "SQL Server",
    server = "ClarDbPrd_Alias",
    database = "Clarity"
  )

d1 <- 
  dbGetMultiQuery(
    connection = con,
    sql_file_path = "S:/QITeam/DataAnalytics/Projects/Frailty/Scripts/test.sql"
    )

The SQL file is a simple Select * to test if the package works

SELECT * FROM CLARITY_DEP;

When I run it I get the below error:

Error: unable to find an inherited method for function ‘dbSendUpdate’ for signature ‘conn = "Microsoft SQL Server", statement = "SQL"’

Trying to research the issue, and it looks like dbSendUpdate from the package RJDBC only works for JBDC and not ODBC. Is there a workaround or a different package that makes it this easy to run a script that is in a .sql file type for ODBC? I love the part where it says it cleans up comments and everything as well (both -- and multiline /**/)

https://samkerns.r-universe.dev/articles/SQLove/SQLove.html


Solution

  • I think it is outdated after some research on GitHub for the function.

    https://github.com/samkerns/SQLove/blob/main/R/dbGetMultiQuery.R

    So I took the code, tweaked it a bit for my purposes, and on a more straightforward SQL script I have for work, it popped results with no problem

    # Check to see if pacman is installed
    ## Install if it is not loaded
    if (!require("pacman")
    ) install.packages("pacman")
    
    # Get needed libraries
    pacman::p_load(
      odbc,
      DBI,
      stringr
    )
    
    # Set up connection
    con <-
      dbConnect(
        odbc::odbc(),
        driver = "SQL Server",
        server = "ClarDbPrd_Alias",
        database = "Clarity"
      )
    
    # Function to read in SQL file execute all commands and return final query
    dbGetMultiQuery <- 
      function(
        connection, # connection set up from dbConnect
        sql_file_path # path to the SQL file
        ) {
        
        #Reading in the SQL file
        sql_file <-
          readr::read_file(sql_file_path)
        
        #Removing all comments /* and --
        ## /**/ comments
        sql_file <- base::gsub("/\\*.*?\\*/", "", sql_file)
        
        ## -- comments
        sql_file <- base::gsub("--.*?\\r", "\\\r", sql_file)
        
        ## special characters
        sql_file <- str_replace_all(sql_file, "[\r\n]", "  ")
        sql_file <- str_replace_all(sql_file, "[\t]", " ")
    
        ## white space before and after to make debugging easier
        sql_file <- trimws(sql_file)
        
        # Splitting the SQL file into individual queries
        ## ignore '; ' because I use that in STRING_AGG
        sql_list <- base::strsplit(sql_file, "(?<!')\\;(?! ')", perl = TRUE)
        
        # Get the number of queries
        query_length <- base::lengths(sql_list)
        
        #If only 1 query is available, it's a SELECT statement, use DBI::dbGetQuery
        if (query_length == 1) {
          DBI::dbGetQuery(connection, sql_list[[1]][[1]])
          
          #If more than 1 query is available, dbExecute for all but final statement
          } else {
            for (i in c(1:(query_length - 1))) {
              DBI::dbExecute(connection, DBI::SQL(sql_list[[1]][[i]]))
              print(paste("Statement", i, "of", query_length, "complete"))
              }
            #Create dataframe from final query statement
            DBI::dbGetQuery(connection, sql_list[[1]][[query_length]])
          }
        }
    
    # Test
    df1 <- 
      dbGetMultiQuery(
        connection = con, 
        sql_file_path = "S:/QITeam/DataAnalytics/Projects/Frailty/Scripts/CasesGeriatric_1c.sql"
        )
    

    I hope people run into this if they are looking to do multiple queries in R because I wish I had come across an answer like this when I first started searching on how to do this in R :)