sqlrdplyrtidyversedbplyr

Is there a way to add a LIKE in a dbplyr inner_join SQL translation?


This is a follow-up question to this one: How to join dataframes using a prefix as a match?

I have those two tables on a SQL server side:

data <- memdb_frame(
  x = c("ANOTHER", "COMMON", "ZEBRA")
)

selection <- memdb_frame(
  x_prefix = c("ANO", "B", "CO"),
  type = c("One type", "Other type", "Other type")
)

I want to join them and keep lines where the x_prefix is a prefix of x.

I am trying to build the appropriate SQL query with dbplyr, starting from here:

inner_join(data, selection, by = c(x = "x_prefix")) |>
  show_query()

which produces:

<SQL>
SELECT `x`, `type`
FROM `dbplyr_001` AS `LHS`
INNER JOIN `dbplyr_002` AS `RHS`
ON (`LHS`.`x` = `RHS`.`x_prefix`)

The query I need is:

<SQL>
SELECT `x`, `type`
FROM `dbplyr_001` AS `LHS`
INNER JOIN `dbplyr_002` AS `RHS`
ON (`LHS`.`x` LIKE `RHS`.`x_prefix` || '%')

I have read that Function translation vignette but I still need help.


Solution

  • Another way to attempt this is using a cross-join and filter. I use this approach with dbplyr, because there is less flexibility for specifying joins in R than there is in SQL.

    # add placeholder column of ones to enable cross-join
    data = data %>%
      mutate(ones = 1)
    selection = selection %>%
      mutate(ones = 1)
    # not necessary if there is another column you already intend to join by
    
    cross_join = inner_join(data, selection, by = "ones") %>%
      select(-ones)
    
    result = cross_join %>%
      filter(x_prefix == LEFT(x, nchar(x_prefix)))
    

    I have used LEFT in capitals for it to be passed untranslated. There is a translation for substr but it would only accept numbers as inputs (and errored if I tried to use nchar(x_prefix) as the number of characters).

    Another option for the filter would be to provide the condition as text within sql() to pass SQL code directly:

    result = cross_join %>%
      filter(sql("x LIKE x_prefix || '%'"))