Is there a way to perform ASOF join in Snowflake / DuckDB / CedarDB based on nearest time in the row in the right table whose 'on' key is closest in absolute distance to the left table.
Currently I have following:
select
ice_cream_sales.brand
, ice_cream_sales.quantity
, temperature.recorded_temperature
, ice_cream_sales.amusement_park_id
from ice_cream_sales
asof join temperature
MATCH_CONDITION (ice_cream_sales.load_datetime >= temperature.recorded_datetime)
using (amusement_park_id);
This works, but only gets the temperature equal to or less than the ice cream sales load_datetime
. This is problematic since we only have temperature every hour. Instead we want to capture the temperature nearest to the load_datetime
of the ice cream sales.
One way can be to compute both distances, compute absolute time difference and choose the closest value:
Pseudocode:
WITH cte AS (
SELECT
ice_cream_sales.brand
,ice_cream_sales.quantity
,temperature.recorded_temperature
,ice_cream_sales.amusement_park_id
,ABS(DIFF(SECOND,ice_cream_sales.load_datetime,temperature.recorded_datetime)) AS diff
FROM ice_cream_sales
ASOF join temperature
MATCH_CONDITION (ice_cream_sales.load_datetime >= temperature.recorded_datetime)
using (amusement_park_id)
UNION ALL
SELECT
ice_cream_sales.brand
,ice_cream_sales.quantity
,temperature.recorded_temperature
,ice_cream_sales.amusement_park_id
,ABS(DIFF(SECOND,ice_cream_sales.load_datetime,temperature.recorded_datetime)) AS diff
FROM ice_cream_sales
ASOF join temperature
MATCH_CONDITION (ice_cream_sales.load_datetime <= temperature.recorded_datetime)
using (amusement_park_id)
)
SELECT *
FROM cte
QUALIFY ROW_NUMBER() OVER(PARTITION BY amusement_park_id, ... ORDER BY diff) = 1;