roracle-databaser-dbiroracle

multi row data frame as bound parameters in `package:ROracle` SELECT statements


I am trying to use a parametrized query multiple times with package:ROracle.

library(DBI)
conn <- dbConnect(ROracle::Oracle(), ...)

statement <- "select * from bigtable where name = :1"
name <- c("Bob", "Alice", "Carol")

I expected the query to be executed for each row of the data frame (i.e. as for to the param argument in DBI::dbGetQuery()), but it is not the case:

dbGetQuery(conn, statement, data.frame(name))
# Error in .oci.GetQuery(conn, statement, data = data, prefetch = prefetch,  : 
#   bind data has too many rows

I can loop over the names, but I think it may be less efficient (I am new to RDBMS but I read about optimization/caching):

lapply(name, function(i) dbGetQuery(conn, statement, data.frame(i)))

How can it be done efficiently?

There are some related questions that does not solve the problem:


Solution

  • I think there are a couple of ways to approach this.

    1. Insert (to a temp table), single-query on a join:

      mynames <- data.frame(name = c("Bob", "Alice", "Carol"))
      dbWriteTable(con, "mytemp", mynames)
      alldat <- dbGetQuery(con, "
        select bt.*
        from bigtable bt
          inner join mytemp my on bt.name=my.name")
      

      This method has the advantage of being a single query. If you need it split by name later, you can always use R's split.

    2. (Apparently this doesn't work in ROracle?) Create a stored-statement and repeatedly bind/fetch:

      res <- dbSendStatement(con, "select * from bigtable where name = :1")
      eachdat <- lapply(name, function(nm) { dbBind(res, list(nm)); dbFetch(res); })
      dbClearResult(res)
      

      I do not tend to use this method, though that's as much preference as anything else.

    3. Use SELECT ... NAME IN (...)

      statement <- paste0("select * from bigtable where name in (", paste(paste0(":", seq_along(name)), collapse = ","), ")")
      alldat <- dbGetQuery(con, statement, list(name))
      

      The paste(...) in the statement produces ":1,:2,:3" (counting along the length of name), which should be compatible with your original query mechanism.

    Having said that ... every server has its own optimization methods, but I suspect that with your lapply(.., \(z) dbGetQuery(..)), your server will cache the optimized version since your query does not change between iterations. A DBA more familiar with Oracle might be able to clarify or refute my claim, but I have a hard time believing that it does not cache the query for some time (in a sql statement cache).