snowflake-cloud-data-platformduckdbcedardb

ASOF join snowflake using 'nearest', similar to pandas.merge_asof


Is there a way to perform ASOF join in Snowflake / DuckDB / CedarDB based on nearest time in the row in the right table whose 'on' key is closest in absolute distance to the left table.

Currently I have following:

select 
  ice_cream_sales.brand
  , ice_cream_sales.quantity
  , temperature.recorded_temperature
  , ice_cream_sales.amusement_park_id
from ice_cream_sales
asof join temperature
MATCH_CONDITION (ice_cream_sales.load_datetime >= temperature.recorded_datetime)
using (amusement_park_id);

This works, but only gets the temperature equal to or less than the ice cream sales load_datetime. This is problematic since we only have temperature every hour. Instead we want to capture the temperature nearest to the load_datetime of the ice cream sales.


Solution

  • One way can be to compute both distances, compute absolute time difference and choose the closest value:

    Pseudocode:

    WITH cte AS (
      SELECT
       ice_cream_sales.brand
      ,ice_cream_sales.quantity
      ,temperature.recorded_temperature
      ,ice_cream_sales.amusement_park_id
      ,ABS(DIFF(SECOND,ice_cream_sales.load_datetime,temperature.recorded_datetime)) AS diff
      FROM ice_cream_sales
      ASOF join temperature
      MATCH_CONDITION (ice_cream_sales.load_datetime >= temperature.recorded_datetime)
        using (amusement_park_id)
      UNION ALL
      SELECT
       ice_cream_sales.brand
      ,ice_cream_sales.quantity
      ,temperature.recorded_temperature
      ,ice_cream_sales.amusement_park_id
      ,ABS(DIFF(SECOND,ice_cream_sales.load_datetime,temperature.recorded_datetime)) AS diff
      FROM ice_cream_sales
      ASOF join temperature
      MATCH_CONDITION (ice_cream_sales.load_datetime <= temperature.recorded_datetime)
        using (amusement_park_id)
    )
    SELECT *
    FROM cte
    QUALIFY ROW_NUMBER() OVER(PARTITION BY amusement_park_id, ... ORDER BY diff) = 1;