rdplyrdbplyrr-dbi

How to read data from a database by chunk in R?


In dplyr, if tbl is a table in a database then head(tbl) gets translated into

select
  *
from
  tbl
limit 6

but there doesn't seem to be a way to use the offset keyword to read data in chunks. E.g. the equivalent of

select
  *
from
  tbl
limit 6 offset 5

doesn't seem possible with dplyr. In dbplyr, there is a do function to let you choose a chunk_size to bring back data chunk-by-chunk.

Is that the only way to do it in R? The solution doesn't have to in dplyr or the tidyverse.


Solution

  • Another approach would be to construct your own offset function. This assumes your database supports it, and the function is unlikely to be transferable to databases of other types.

    Something like the following:

    offset_head = function(table, num, offset){
    
      # get connection
      db_connection = table$src$con
    
      sql_query = build_sql(con = db_connection,
                          sql_render(table),
                          "\nLIMIT ", num,
                          "\nOFFSET ", offset
      )
    
      return(tbl(db_connection, sql(sql_query)))
    }