sqlrsql-serverdbplyr

Use dplyr to query a temporal table at a point in time


In Microsoft SQL Server you can define system times for tables and query them like this:

select top 10 * from table FOR SYSTEM_TIME AS OF '2024-01-01'

I would like to use R and dplyr to query the same table. Assuming I have a connection con, how would I do this?

I have tried:

tbl(con, "table FOR SYSTEM_TIME AS OF '2024-01-01'")

and

tbl(con, "table") |>
   filter(sql("FOR SYSTEM_TIME AS OF '2024-01-01'"))

Anyone know if it is possible?


Solution

  • The documentation on temporal tables states:

    If the PERIOD columns aren't hidden, their values appear in a SELECT * query. If you specified PERIOD columns as HIDDEN, their values don't appear in a SELECT * query. When the PERIOD columns are hidden, you must reference the PERIOD columns specifically in the SELECT clause to return the values for these columns.

    It is possible that R may still observe the temporal columns even if they are hidden. I'd recommend something like the following to check.

    remote_table = tbl(con, "table")
    colnames(remote_table)
    

    If the temporal columns are visible to R, then you should be able to include them in standard filter clauses.


    Otherwise, I would recommend making use of the fact that a dbplyr object is essentially two pieces: a database connection and a query. Hence, the following is plausible:

    remote_table = tbl(con, sql("SELECT * FROM table FOR SYSTEM_TIME AS OF '2024-01-01'"))
    

    This is similar to code in your question, the key difference being that this needs to be a complete query.


    If this works, my inclination would be to make a custom function, something like:

    system_time_as_of = function(table, date){
    
      db_connection = table$src$con
      
      sql_query = dbplyr::build_sql(
        con = db_connection,
        dbplyr::sql_render(table), " FOR SYSTEM_TIME AS OF ", date
      )
      return(dplyr::tbl(db_connection, dbplyr::sql(sql_query)))
    }
    
    # intended use
    remote_table = tbl(con, "table")
    remote_table = system_time_as_of(remote_table, '2024-01-01')