sqlsnowflake-cloud-data-platform

Create binary variable after grouping


I'm trying to create a binary flag variable where this new column indicates if an ID has an item, from a long list of items. The table is called items and exists in Snowflake.

My table looks like

ID | Items |
------------
1  |  A    |
1  |  B    |
1  |  C    |
1  |  X    |
2  |  N    |
2  |  M    |
2  |  P    |
3  |  R    |
3  |  T    |
3  |  X    |

I want to create column that shows which ID's have item X. My expected output is

ID | HasX |
-----------
1  |   1  |
2  |   0  |
3  |   1  |

I've tried a couple different options, but when I do

SELECT 
  i.ID,
  CASE
    WHEN Items IN ('X') 
    THEN 1 OVER (PARTITION BY ID)
    ELSE 0
  END AS HasX
FROM items i

which gives me an error Invalid function '1'.

This is probably simple, but I haven't found a solution anywhere yet. I suspect I need to use a window function like this, but I haven't figured out how to with CASE WHEN.


Solution

  • This is called conditional aggregation.

    SELECT ID,
           SUM(CASE WHEN Items = 'X' THEN 1 ELSE 0 END) AS HasX 
    FROM items
    GROUP BY ID;
    
    

    You could use MAX function as well.

    SELECT ID,
           MAX(CASE WHEN Items = 'X' THEN 1 ELSE 0 END) AS HasX
    FROM items
    GROUP BY ID;
    

    Another option would be using EXISTS

    SELECT DISTINCT ID,
           CASE WHEN EXISTS ( SELECT 1 
                              FROM items i2 
                              WHERE i2.ID = i1.ID AND i2.Items = 'X'
                            ) THEN 1 
               ELSE 0 
           END AS HasX
    FROM items i1;
    

    See example