sqlgoogle-bigqueryadobe-analytics

SQL - Parse a field and SUM numbers at regular delimiter intervals


I request your help for an issue beyond my current skills...

I'm using Google Big Query to store analytics data about my website, and to calculate the revenue I have a quite difficult query to build.

We have the field %product% which is formatted as following :

;%productID%;%productQuantity%;%productRevenue%;;

If more than one product has been bought, the different products data will be delimited by ",", which can give this :

;12345678;1;49.99;;,;45678912;1;54.99;;
;45678912;2;59.98;;,;14521452;2;139.98;;,;12345678;2;19.98;;
;14521452;1;54.99;;

The only way to calculate the revenue is to sum all the different %productRevenue% from a line and store this into a column.

I have no idea how to do it just with a SQL query... Maybe with RegEx ? Any idea ? I'd like to create a view with that info to easily pull the data into PowerBI then. But maybe I should process that with M directly in PBI ?

Thanks a lot,

Alex


Solution

  • Below is for BigQuery Standard SQL

    #standardSQL
    SELECT 
      SPLIT(i, ';')[OFFSET(1)] productID,
      SUM(CAST(SPLIT(i, ';')[OFFSET(2)] AS INT64)) productQuantity,
      SUM(CAST(SPLIT(i, ';')[OFFSET(3)] AS FLOAT64)) productRevenue
    FROM `project.dataset.table`,
    UNNEST(SPLIT(product)) i
    GROUP BY productID   
    

    if to apply to sample data from your question - output is

    Row productID   productQuantity productRevenue   
    1   12345678    3               69.97    
    2   45678912    3               114.97   
    3   14521452    3               194.97