I am using tidyverse
and ROracle
.
I have established a connection conn
:
> conn
## User name:
## Connect string: IPIAMPR2.WORLD
## Server version: 19.0.0.0.0
## Server type: Oracle RDBMS
## Results processed: 60
## OCI prefetch: FALSE
## Bulk read: 1000
## Bulk write: 1000
## Statement cache size: 0
## Open results: 4
I have also created a table on the oracle server side from a dataframe:
df <- tibble(x = 1:3, date = as.Date("2023-10-29"))
dbWriteTable(conn, "TEST_DATE", df)
Then, I am using a lazy connection to access the table on the oracle server side:
DF <- tbl(conn, "TEST_DATE")
DF
# # Source: table<TEST_DATE> [?? x 2]
# # Database: OraConnection
# x date
# <dbl> <dttm>
# 1 1 2023-10-29 00:00:00
# 2 2 2023-10-29 00:00:00
# 3 3 2023-10-29 00:00:00
How to filter the oracle table on a date?
(
DF
|> filter(date == as.Date("2023-10-29"))
)
## Error in .oci.SendQuery(conn, statement, data = data, prefetch = prefetch, :
## ORA-00936: missing expression
I am not getting why this error happens...
(
DF
|> filter(as.character(date) == "29/10/23 00:00:00,000000"))
)
I am afraid that the call to as.character
would be extremely slow on a table with billions of lines. More precisely, I am working with very large tables indexed on the date
column, and any operation on this column will be too long.
/!\ I am working on french systems, I guess that is why the date become "29/10/23" with as.character()
Based on the translation you provided:
SELECT *
FROM ("TEST_DATE")
WHERE ("date" = DATE('2023-10-29'))
My best guess (I am not an Oracle user) is that there is a fault in the dbplyr translation. For Oracle, the TO_DATE
function appears to be the conversion function (reference). Hence dbplyr is using the wrong function - this should be a bug report on dbplyr.
Regarding the error message:
The DATE
function in Oracle takes four arguments: Date (Format, Day, Month, Year)
(reference). So I would guess that the error message
ORA-00936: missing expression
is because only a single argument is provided in the translation.
There is also a chance that the error is due to confusion by the database as to whether date
is a column name or a function. Oracle does not accept 'table' as a table name, perhaps a similar dynamic is at work here?
So what to do about this?
There are two options that seem plausible:
If you provide the date as a string, the database will likely convert this into a date to do the comparison (reference).
DF %>%
filter(date == "2023-10-29")
I use this approach in SQL Server consistently. This will require you to provide the date in the default format of your database (likely YYYY-MM-DD
or DD-MON-YYYY
).
Similar to @pietrodito's answer, if dbplyr does not have a translation defined, then it will pass the command untranslated. Hence you can do the following:
DF %>%
filter(date == TO_DATE("2023-10-29", "YYYY-MM-DD"))
I tested both lubridate::as_date
and base::as.Date
. Both gave me translation errors. This may have been fixed in the latest version of dbplyr, otherwise it is more evidence of a bug.
# setup
library(dbplyr)
library(dplyr)
df = data.frame(
id = 1:3,
my_date = as.Date("2023-01-01", "2023-02-01", "2024-07-19")
)
remote_df = tbl_lazy(df, con = simulate_oracle())
Testing base::as.Date
:
# this works
as.Date("2023-10-29", format = "%Y-%m-%d")
# this errors
remote_df %>%
filter(my_date == as.Date("2023-10-29", format = "%Y-%m-%d")) %>%
show_query()
Testing lubraidate::as_date
:
library(lubridate)
# this works
as_date("20231029", format = "%Y%m%d")
# this errors
remote_df %>%
filter(my_date == as_date("2023-10-29", format = "%Y-%m-%d")) %>%
show_query()
I also tested converting to date format prior to the dbplyr query:
converted_date = as.Date("2023-10-29", format = "%Y-%m-%d")
remote_df %>%
filter(my_date == converted_date) %>%
show_query()
However, during translation this gets converted back to text:
SELECT *
FROM (`df`)
WHERE (`my_date` = '2023-10-29')
which is part of why I suggest relying on implicit conversion.