sqldata-analysisazure-databricksdatabricks-sqlexploratory-data-analysis

How to create a list considering a condition in sql (azure databricks)


I am working on a SQL query (Azure Databricks environment), where considering the following dataset:

clientid visited channel purchase visit_order
123 abc133 google 0 1
123 efg446 facebook 0 2
123 gij729 instagram 1 3
456 klm183 google 0 1
456 nop496 linkedin 0 2
456 qrs729 pinterest 1 3
456 tuv894 google 0 1
456 wyz634 instagram 0 2

I want to get the following output:

clientid user_journey conversion
123 google, facebook, instagram 1
456 google, linkedin, interest 1
456 google, instagram 0

where the user_jorney column is composed of the channels that participated in a conversion journey. Note that the journey of users who, until then, have not made a purchase is also built.

Looking for commands that can help with this task, I found concat_ws, where I wrote the code below:

select 
  clientid, 
  concat_ws(',', collect_list(channel)) as user_journey, 
  sum(purchase) as conversion
from table_name group by  clientid;

I get this result:

clientid user_journey conversion
123 google, facebook, instagram 1
456 google, linkedin, pinterest, google, instagram 1

Now I'm trying to consider a condition to get the desired result but so far I haven't been able to find.

Could you help me how can i solve this task?


Solution

  • I tried to repro your scenario,

    where I gave sub query instead of original table in that query, I am selecting original table along with one extra column as row number where I am giving row number two every row partitioned by visit_order column and orderd by visited column.

    My Query:

    select 
      clientid,
      concat_ws(',', collect_list(channel)) as user_journey, 
      sum(purchase) as conversion
    from (SELECT *,ROW_NUMBER() OVER (PARTITION BY visit_order ORDER BY visited) AS RowNumber FROM docs) as docstb group by clientid , RowNumber order by clientid asc
    

    Execution and Output:

    enter image description here