USER ID | string_col |
---|---|
100001 | Here |
100001 | there |
100001 | Apple |
200002 | this is |
200002 | that is |
200002 | Apple |
200002 | Cell 4 |
That is my raw table & I want to use the word "Apple" as a trigger word to concatenate everything above that for every userID.
Below table is the output I'm looking for:
USER ID | string_col | Result |
---|---|---|
100001 | Here | null |
100001 | there | null |
100001 | Apple | Here There |
200002 | this is | null |
200002 | that is | null |
200002 | Apple | this is that is |
200002 | Cell 4 | null |
In order to maintain proper ordering within concatenation - you should have extra column that defines that order - this is usually date or datetime or timestamp or just numeric, etc.
So, below is for BigQuery Standard SQL assuming you do have such a column named for example as ts
SELECT * EXCEPT(grp),
CASE
WHEN string_col = 'Apple'
THEN STRING_AGG(string_col, ' ') OVER win
END AS Result
FROM (
SELECT *,
COUNTIF(string_col = 'Apple') OVER win AS grp
FROM your_table
WINDOW win AS (PARTITION BY user_id ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WINDOW win AS (PARTITION BY user_id, grp ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
if applied to sample data as in your question - output is