sqlgoogle-bigquerypartition-by

An efficient way to backward fill group values by type


I have a logs table with id, name, type, and date columns. And I want to create a new column which represents next id value from the 2nd type. Query must be grouped by name and only type1 values filled based on next type2 value.

logs table:

id name type date
1 name1 type1 2022-01-01
2 name1 type1 2022-01-02
3 name1 type2 2022-01-03
4 name1 type1 2022-01-03
5 name1 type2 2022-01-04
6 name1 type1 2022-01-05
7 name2 type1 2022-01-03
8 name2 type2 2022-01-08

desirable output:

id name type date type2_id
1 name1 type1 2022-01-01 3
2 name1 type1 2022-01-02 3
4 name1 type1 2022-01-03 5
6 name1 type1 2022-01-05
7 name2 type1 2022-01-03 8

I was able to do it using joins and LAG function. However, maybe someone could help me make it more efficient?

WITH logs AS (
  SELECT 1 AS id, 'name1' AS name, 'type1' AS type, '2022-01-01' AS date,
  UNION ALL
  SELECT 2 AS id, 'name1' AS name, 'type1' AS type, '2022-01-02' AS date,
  UNION ALL
  SELECT 3 AS id, 'name1' AS name, 'type2' AS type, '2022-01-03' AS date,
  UNION ALL
  SELECT 4 AS id, 'name1' AS name, 'type1' AS type, '2022-01-03' AS date,
  UNION ALL
  SELECT 5 AS id, 'name1' AS name, 'type2' AS type, '2022-01-04' AS date,
  UNION ALL
  SELECT 6 AS id, 'name1' AS name, 'type1' AS type, '2022-01-05' AS date,
  UNION ALL
  SELECT 7 AS id, 'name2' AS name, 'type1' AS type, '2022-01-03' AS date,
  UNION ALL
  SELECT 8 AS id, 'name2' AS name, 'type2' AS type, '2022-01-08' AS date,
)


SELECT
  t1.id,
  t1.name,
  t1.type,
  t1.date,
  t2.id AS type2_id,

FROM (
  SELECT *,
  FROM logs
  WHERE logs.type = 'type1'
) AS t1

LEFT JOIN (
  SELECT *,
    IFNULL(LAG(logs.date) OVER(PARTITION BY logs.name, logs.type ORDER BY logs.date), '2000-01-1') AS date_prev,
  FROM logs
  WHERE type = 'type2'
) AS t2
  ON t2.name = t1.name
    AND t2.date > t1.date
    AND t2.date_prev <= t1.date

ORDER BY t1.name, t1.date

Solution

  • Consider below approach

    select *, first_value(if(type='type2', id, null) ignore nulls) over win as type2_id
    from logs
    qualify type = 'type1'
    window win as (partition by name order by date rows between 1 following and unbounded following)           
    

    if applied to sample data in your question - output is

    enter image description here