sqlgoogle-bigqueryreportname-value

Query that displays which attributes an instance possesses, out of a superset of attributes


I have a relational dataset in Bigquery containing two tables.

The first table holds customer data

+-------------+--------+
| Customer ID | Name   |
+-------------+--------+
| 1           | Bob    |
+-------------+--------+
| 2           | Jenny  |
+-------------+--------+
| 3           | Janice |
+-------------+--------+

The second table holds various name/value pairs associated to the customer in first table:

+-------------+----------+-------+
| Customer ID | Category | Value |
+-------------+----------+-------+
| 1           | A        | A     |
+-------------+----------+-------+
| 1           | A        | B     |
+-------------+----------+-------+
| 1           | B        | A     |
+-------------+----------+-------+
| 2           | B        | B     |
+-------------+----------+-------+

I would like to generate a report that enumerates each customer, and sets a TRUE under each name:value where it is found in table 2 e.g:

+-------------+------+------+-----+------+------+
| Customer ID | A:A  | A:B  | A:C | B:A  | B:B  |
+-------------+------+------+-----+------+------+
| 1           | TRUE | TRUE |     | TRUE |      |
+-------------+------+------+-----+------+------+
| 2           |      |      |     |      | TRUE |
+-------------+------+------+-----+------+------+
| 3           |      |      |     |      |      |
+-------------+------+------+-----+------+------+

I've tried to specify each of the category:value combinations as columns in my select statement

select
  customer id,
  a:a, 
  a:b, 
  a:c, 
  b:a,
  b:b 
from 
  table_1 t1
join 
  table_2 t2 
on
  t1.customer_id = t2.customer_id  

But that gives me nothing because I don't know how to get the query to set a cell to TRUE, once the value is found.

Apologies if this is obvious, i'm painfully new to SQL.


Solution

  • Below is for BigQuery Standard SQL

    #standardSQL
    SELECT customer_id,
      LOGICAL_OR((category, value) = ('A', 'A')) AS a_a,
      LOGICAL_OR((category, value) = ('A', 'B')) AS a_b,
      LOGICAL_OR((category, value) = ('A', 'C')) AS a_c,
      LOGICAL_OR((category, value) = ('B', 'A')) AS b_a,
      LOGICAL_OR((category, value) = ('B', 'B')) AS b_b
    FROM `project.dataset.table1`  
    JOIN `project.dataset.table2`
    USING (customer_id)
    GROUP BY customer_id   
    

    You can test, play with above using sample data from your question as in example below

    #standardSQL
    WITH `project.dataset.table1` AS (
      SELECT 1 Customer_ID, 'Bob' Name UNION ALL
      SELECT 2, 'Jenny' UNION ALL
      SELECT 3, 'Janice' 
    ), `project.dataset.table2` AS (
      SELECT 1 Customer_ID, 'A' Category, 'A' Value UNION ALL
      SELECT 1, 'A', 'B' UNION ALL
      SELECT 1, 'B', 'A' UNION ALL
      SELECT 2, 'B', 'B' 
    )
    SELECT customer_id,
      LOGICAL_OR((category, value) = ('A', 'A')) AS a_a,
      LOGICAL_OR((category, value) = ('A', 'B')) AS a_b,
      LOGICAL_OR((category, value) = ('A', 'C')) AS a_c,
      LOGICAL_OR((category, value) = ('B', 'A')) AS b_a,
      LOGICAL_OR((category, value) = ('B', 'B')) AS b_b
    FROM `project.dataset.table1`  
    JOIN `project.dataset.table2`
    USING (customer_id)
    GROUP BY customer_id   
    

    with result

    Row customer_id a_a     a_b     a_c     b_a     b_b  
    1   1           true    true    false   true    false    
    2   2           false   false   false   false   true      
    

    In case if you need/want output exactly as in your question - you can use below adjusted version

    #standardSQL
    SELECT customer_id,
      IF(LOGICAL_OR((category, value) = ('A', 'A')), 'TRUE', '') AS a_a,
      IF(LOGICAL_OR((category, value) = ('A', 'B')), 'TRUE', '') AS a_b,
      IF(LOGICAL_OR((category, value) = ('A', 'C')), 'TRUE', '') AS a_c,
      IF(LOGICAL_OR((category, value) = ('B', 'A')), 'TRUE', '') AS b_a,
      IF(LOGICAL_OR((category, value) = ('B', 'B')), 'TRUE', '') AS b_b
    FROM `project.dataset.table1`  
    JOIN `project.dataset.table2`
    USING (customer_id)
    GROUP BY customer_id
    

    with result

    Row customer_id a_a     a_b     a_c     b_a     b_b  
    1   1           TRUE    TRUE            TRUE         
    2   2                                           TRUE     
    

    Note: in above examples - you don't really need joins as you are not using fields from table1 rather than as a filter (to present only users from table1)