This is a follow-up question to this one: How to join dataframes using a prefix as a match?
I have those two tables on a SQL server side:
data <- memdb_frame(
x = c("ANOTHER", "COMMON", "ZEBRA")
)
selection <- memdb_frame(
x_prefix = c("ANO", "B", "CO"),
type = c("One type", "Other type", "Other type")
)
I want to join them and keep lines where the x_prefix
is a prefix of x
.
I am trying to build the appropriate SQL query with dbplyr, starting from here:
inner_join(data, selection, by = c(x = "x_prefix")) |>
show_query()
which produces:
<SQL>
SELECT `x`, `type`
FROM `dbplyr_001` AS `LHS`
INNER JOIN `dbplyr_002` AS `RHS`
ON (`LHS`.`x` = `RHS`.`x_prefix`)
<SQL>
SELECT `x`, `type`
FROM `dbplyr_001` AS `LHS`
INNER JOIN `dbplyr_002` AS `RHS`
ON (`LHS`.`x` LIKE `RHS`.`x_prefix` || '%')
I have read that Function translation vignette but I still need help.
Another way to attempt this is using a cross-join and filter. I use this approach with dbplyr, because there is less flexibility for specifying joins in R than there is in SQL.
# add placeholder column of ones to enable cross-join
data = data %>%
mutate(ones = 1)
selection = selection %>%
mutate(ones = 1)
# not necessary if there is another column you already intend to join by
cross_join = inner_join(data, selection, by = "ones") %>%
select(-ones)
result = cross_join %>%
filter(x_prefix == LEFT(x, nchar(x_prefix)))
I have used LEFT
in capitals for it to be passed untranslated. There is a translation for substr
but it would only accept numbers as inputs (and errored if I tried to use nchar(x_prefix)
as the number of characters).
Another option for the filter would be to provide the condition as text within sql()
to pass SQL code directly:
result = cross_join %>%
filter(sql("x LIKE x_prefix || '%'"))