google-bigquerycount

How can I count the number of times a column has the same value in a sequence?


I am looking for a solution in BigQuery where I could count the number of times a column has the same value in a sequence. Currently, I have this table and I would like to have a new column that counts how often the value in flow column has occurred in the sequence.

Data :

with t as (session_id,page_group,page_name,steps_flow) as values (
(1017911,Home Page,Home Page,1),
(1017911,Site Search,Site Search,2),
(1017911,Range - PIP,Wall shelves,3),
(1017911,Range - PIP,Wall shelves,4),
(1017911,Range - PIP,Wall shelves,5),
(1017911,Range - PIP,Wall shelves,6),
(1017911,Site Search,Site Search,7),
(1017911,Site Search,Site Search,8),
(1017911,Ideas,Ideas,9),
(1017911,Range - PLP,EKTORP series,10),
(1017911,Range - PIP,Sofas,11)
)

(new col f0 is the expected output):

current data

expected output enter image description here

I am not able to find the right query.


Solution

  • This is a kind of gaps and islands problem and you might consider below

    SELECT * EXCEPT(gap, island), COUNT(1) OVER (PARTITION BY island) FROM (
      SELECT *, COUNTIF(gap) OVER w1 AS island FROM (
        SELECT *, page_name <> LAG(page_name) OVER w0 AS gap
          FROM sample_table
        WINDOW w0 AS (PARTITION BY session_id ORDER BY steps_flow)
      ) WINDOW w1 AS (PARTITION BY session_id ORDER BY steps_flow)
    );
    
    -- Query results
    
    +------------+-------------+---------------+------------+-----+
    | session_id | page_group  |   page_name   | steps_flow | f0_ |
    +------------+-------------+---------------+------------+-----+
    |    1017911 | Home Page   | Home Page     |          1 |   1 |
    |    1017911 | Site Search | Site Search   |          2 |   1 |
    |    1017911 | Range - PIP | Wall shelves  |          3 |   4 |
    |    1017911 | Range - PIP | Wall shelves  |          4 |   4 |
    |    1017911 | Range - PIP | Wall shelves  |          5 |   4 |
    |    1017911 | Range - PIP | Wall shelves  |          6 |   4 |
    |    1017911 | Site Search | Site Search   |          7 |   2 |
    |    1017911 | Site Search | Site Search   |          8 |   2 |
    |    1017911 | Ideas       | Ideas         |          9 |   1 |
    |    1017911 | Range - PLP | EKTORP series |         10 |   1 |
    |    1017911 | Range - PIP | Sofas         |         11 |   1 |
    +------------+-------------+---------------+------------+-----+
    

    You can test the query with below sample data.

    WITH sample_table AS (
      SELECT '1017911' session_id, 'Home Page' page_group, 'Home Page' page_name, 1 steps_flow  UNION ALL
      SELECT '1017911', 'Site Search', 'Site Search', 2 UNION ALL
      SELECT '1017911', 'Range - PIP', 'Wall shelves', 3 UNION ALL
      SELECT '1017911', 'Range - PIP', 'Wall shelves', 4 UNION ALL
      SELECT '1017911', 'Range - PIP', 'Wall shelves', 5 UNION ALL
      SELECT '1017911', 'Range - PIP', 'Wall shelves', 6 UNION ALL
      SELECT '1017911', 'Site Search', 'Site Search', 7 UNION ALL
      SELECT '1017911', 'Site Search', 'Site Search', 8 UNION ALL
      SELECT '1017911', 'Ideas', 'Ideas',9 UNION ALL
      SELECT '1017911', 'Range - PLP', 'EKTORP series',10 UNION ALL
      SELECT '1017911', 'Range - PIP', 'Sofas',11 
    )