sqlgoogle-bigquerywindow-functionsstring-concatenation

How to concatenate N rows of a column based on a specific column value in Google BigQuery?


USER ID string_col
100001 Here
100001 there
100001 Apple
200002 this is
200002 that is
200002 Apple
200002 Cell 4

That is my raw table & I want to use the word "Apple" as a trigger word to concatenate everything above that for every userID.

Below table is the output I'm looking for:

USER ID string_col Result
100001 Here null
100001 there null
100001 Apple Here There
200002 this is null
200002 that is null
200002 Apple this is that is
200002 Cell 4 null

Solution

  • In order to maintain proper ordering within concatenation - you should have extra column that defines that order - this is usually date or datetime or timestamp or just numeric, etc.

    So, below is for BigQuery Standard SQL assuming you do have such a column named for example as ts

    SELECT * EXCEPT(grp),
      CASE 
        WHEN string_col = 'Apple'
        THEN STRING_AGG(string_col, ' ') OVER win
      END AS Result
    FROM (
      SELECT *, 
        COUNTIF(string_col = 'Apple') OVER win AS grp
      FROM your_table
      WINDOW win AS (PARTITION BY user_id ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
    )
    WINDOW win AS (PARTITION BY user_id, grp ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
    

    if applied to sample data as in your question - output is

    enter image description here