postgresqlcrosstabclickstream

Clickstream: PostgreSQL Crosstab for UTM Path by User_ID


I have some Clickstream data I would like to analyze to determine what paid campaigns are contributing to the most conversions.

I have a table in the database that has the following:

user_id |   sent_at        |   campaign_name    |  last_click_attribution   
  101   | 2018-10-01 13:04 |   Google_Branded   |  Facebook_Focus
  101   | 2018-10-01 13:07 |   Google_Branded   |  Facebook_Focus 
  101   | 2018-10-02 13:09 |   Facebook_Focus   |  Facebook_Focus
  102   | 2018-09-25 13:04 |   Google_Focus     |  Google_Branded
  102   | 2018-09-27 09:24 |   Google_Branded   |  Google_Branded
  102   | 2018-10-01 11:25 |   Google_Branded   |  Google_Branded
  103   | 2018-09-27 13:04 |   Google_Branded   |  Google_Branded
  103   | 2018-09-28 09:15 |   Google_Branded   |  Google_Branded
  103   | 2018-09-29 18:34 |   Google_Branded   |  Google_Branded
  103   | 2018-09-30 21:02 |   Google_Branded   |  Google_Branded

The campaign name is the campaign associated with the ad they clicked to get to our website. The last click attribution is the ad they clicked last before they created a user account.

I would like to create a PostgreSQL query that will have the following:

user_id |   last_click_attribution |   second_last_ad    |  third_last_ad  |....   
  101   | Facebook_Focus           |   Google_Branded    |  Google_Branded
  102   | Google_Branded           |   Google_Branded    |  Google Focus 
  103   | Google_Branded           |   Google_Branded    |  Google_Branded

I presume there is a way to do this with crosstab or perhaps joining two views but I am not certain how to accomplish it.

Thanks for the help!

If you have any other suggestions as to analyses to do on clickstream data that are valuable, and SQL query examples to reference, those would also be much appreciated.


Solution

  • You can try to use make row number in a subquery, then use condition aggregate function to make it.

    CREATE TABLE T(
       user_id int,
       sent_at timestamp,
       campaign_name varchar(50)
    );
    
    
    INSERT INTO T VALUES (101, '2018-10-01 13:04','Google_Branded');   
    INSERT INTO T VALUES (101, '2018-10-01 13:07','Google_Branded');   
    INSERT INTO T VALUES (101, '2018-10-02 13:09','Facebook_Focus');   
    INSERT INTO T VALUES (102, '2018-09-25 13:04','Google_Focus');     
    INSERT INTO T VALUES (102, '2018-09-27 09:24','Google_Branded');   
    INSERT INTO T VALUES (102, '2018-10-01 11:25','Google_Branded');   
    INSERT INTO T VALUES (103, '2018-09-27 13:04','Google_Branded');   
    INSERT INTO T VALUES (103, '2018-09-28 09:15','Google_Branded');   
    INSERT INTO T VALUES (103, '2018-09-29 18:34','Google_Branded');   
    INSERT INTO T VALUES (103, '2018-09-30 21:02','Google_Branded');   
    

    Query 1:

    SELECT  user_id,
            MAX(CASE WHEN rn = 1 then campaign_name end) last_click_attribution,
            MAX(CASE WHEN rn = 2 then campaign_name end) second_last_ad,
            MAX(CASE WHEN rn = 3 then campaign_name end) third_last_ad,
            MAX(CASE WHEN rn = 4 then campaign_name end) fourth_last_ad
    FROM (
      select *,row_number() over(partition by user_id ORDER by sent_at desc) rn
      from T
    ) t1
    group by user_id
    

    Results:

    | user_id | last_click_attribution | second_last_ad |  third_last_ad | fourth_last_ad |
    |---------|------------------------|----------------|----------------|----------------|
    |     101 |         Facebook_Focus | Google_Branded | Google_Branded |         (null) |
    |     102 |         Google_Branded | Google_Branded |   Google_Focus |         (null) |
    |     103 |         Google_Branded | Google_Branded | Google_Branded | Google_Branded |