roracle-databasedplyrdbplyrroracle

How to filter on a date with dbplyr and ROracle?


Setup

I am using tidyverse and ROracle.

I have established a connection conn:

> conn
## User name:              
##   Connect string:        IPIAMPR2.WORLD 
## Server version:        19.0.0.0.0 
## Server type:           Oracle RDBMS 
## Results processed:     60 
## OCI prefetch:          FALSE 
## Bulk read:             1000 
## Bulk write:            1000 
## Statement cache size:  0 
## Open results:          4 

I have also created a table on the oracle server side from a dataframe:

df <- tibble(x = 1:3, date = as.Date("2023-10-29"))

dbWriteTable(conn, "TEST_DATE", df)

Then, I am using a lazy connection to access the table on the oracle server side:

DF <- tbl(conn, "TEST_DATE")

DF
# # Source:   table<TEST_DATE> [?? x 2]
# # Database: OraConnection
# x date               
# <dbl> <dttm>             
#    1  1 2023-10-29 00:00:00
#    2  2 2023-10-29 00:00:00
#    3  3 2023-10-29 00:00:00

Question

How to filter the oracle table on a date?

Issue

(
  DF
  |> filter(date == as.Date("2023-10-29"))
)
## Error in .oci.SendQuery(conn, statement, data = data, prefetch = prefetch,  : 
## ORA-00936: missing expression

I am not getting why this error happens...

A workaround that I suspect to be slow on large tables

(
  DF
  |> filter(as.character(date) == "29/10/23 00:00:00,000000"))
)

I am afraid that the call to as.character would be extremely slow on a table with billions of lines. More precisely, I am working with very large tables indexed on the date column, and any operation on this column will be too long.

/!\ I am working on french systems, I guess that is why the date become "29/10/23" with as.character()


Solution

  • Cause:

    Based on the translation you provided:

    SELECT *
    FROM ("TEST_DATE")
    WHERE ("date" = DATE('2023-10-29'))
    

    My best guess (I am not an Oracle user) is that there is a fault in the dbplyr translation. For Oracle, the TO_DATE function appears to be the conversion function (reference). Hence dbplyr is using the wrong function - this should be a bug report on dbplyr.

    Regarding the error message:

    The DATE function in Oracle takes four arguments: Date (Format, Day, Month, Year) (reference). So I would guess that the error message

    ORA-00936: missing expression

    is because only a single argument is provided in the translation.

    There is also a chance that the error is due to confusion by the database as to whether date is a column name or a function. Oracle does not accept 'table' as a table name, perhaps a similar dynamic is at work here?


    So what to do about this?

    There are two options that seem plausible:

    (1) rely on implicit conversion

    If you provide the date as a string, the database will likely convert this into a date to do the comparison (reference).

    DF %>%
      filter(date == "2023-10-29")
    

    I use this approach in SQL Server consistently. This will require you to provide the date in the default format of your database (likely YYYY-MM-DD or DD-MON-YYYY).

    (2) use non-translation

    Similar to @pietrodito's answer, if dbplyr does not have a translation defined, then it will pass the command untranslated. Hence you can do the following:

    DF %>%
      filter(date == TO_DATE("2023-10-29", "YYYY-MM-DD"))
    

    Other info

    I tested both lubridate::as_date and base::as.Date. Both gave me translation errors. This may have been fixed in the latest version of dbplyr, otherwise it is more evidence of a bug.

    # setup
    library(dbplyr)
    library(dplyr)
    
    df = data.frame(
      id = 1:3,
      my_date = as.Date("2023-01-01", "2023-02-01", "2024-07-19")
    )
    
    remote_df = tbl_lazy(df, con = simulate_oracle())
    

    Testing base::as.Date:

    # this works
    as.Date("2023-10-29", format = "%Y-%m-%d")
    # this errors
    remote_df %>%
      filter(my_date == as.Date("2023-10-29", format = "%Y-%m-%d")) %>%
      show_query()
    

    Testing lubraidate::as_date:

    library(lubridate)
    # this works
    as_date("20231029", format = "%Y%m%d")
    # this errors
    remote_df %>%
      filter(my_date == as_date("2023-10-29", format = "%Y-%m-%d")) %>%
      show_query()
    

    I also tested converting to date format prior to the dbplyr query:

    converted_date = as.Date("2023-10-29", format = "%Y-%m-%d")
    
    remote_df %>%
      filter(my_date == converted_date) %>%
      show_query()
    

    However, during translation this gets converted back to text:

    SELECT *
    FROM (`df`) 
    WHERE (`my_date` = '2023-10-29')
    

    which is part of why I suggest relying on implicit conversion.