In Microsoft SQL Server you can define system times for tables and query them like this:
select top 10 * from table FOR SYSTEM_TIME AS OF '2024-01-01'
I would like to use R and dplyr to query the same table. Assuming I have a connection con
, how would I do this?
I have tried:
tbl(con, "table FOR SYSTEM_TIME AS OF '2024-01-01'")
and
tbl(con, "table") |>
filter(sql("FOR SYSTEM_TIME AS OF '2024-01-01'"))
Anyone know if it is possible?
The documentation on temporal tables states:
If the PERIOD columns aren't hidden, their values appear in a SELECT * query. If you specified PERIOD columns as HIDDEN, their values don't appear in a SELECT * query. When the PERIOD columns are hidden, you must reference the PERIOD columns specifically in the SELECT clause to return the values for these columns.
It is possible that R may still observe the temporal columns even if they are hidden. I'd recommend something like the following to check.
remote_table = tbl(con, "table")
colnames(remote_table)
If the temporal columns are visible to R, then you should be able to include them in standard filter
clauses.
Otherwise, I would recommend making use of the fact that a dbplyr object is essentially two pieces: a database connection and a query. Hence, the following is plausible:
remote_table = tbl(con, sql("SELECT * FROM table FOR SYSTEM_TIME AS OF '2024-01-01'"))
This is similar to code in your question, the key difference being that this needs to be a complete query.
If this works, my inclination would be to make a custom function, something like:
system_time_as_of = function(table, date){
db_connection = table$src$con
sql_query = dbplyr::build_sql(
con = db_connection,
dbplyr::sql_render(table), " FOR SYSTEM_TIME AS OF ", date
)
return(dplyr::tbl(db_connection, dbplyr::sql(sql_query)))
}
# intended use
remote_table = tbl(con, "table")
remote_table = system_time_as_of(remote_table, '2024-01-01')