sqlsnowflake-cloud-data-platformscaffolding

Scaffolding to find missing values in Snowflake


I have a time series data set. For each date, I should have the following:

Category Sub-category Value
A A1 1
A A2 2
B B1 3
B B2 4

However, I do have sometimes missing values, something like:

Category Sub-category Value
A A1 1
B B2 4

What I want to do is identify missing values for each date. So with the above, it would be:

Category Sub-category Value
A A1 1
A A2 NULL
B B1 NULL
B B2 4

To do that, I have created a table like this in Snowflake called scaffold:

Category Sub-category
A A1
A A2
B B1
B B2

And I have written the following query:

select 
a.date
,b.category
,b.subcategory
,a.value

  
from main_table a
right outer join scaffold b
on a.category=b.category
and a.subcategory=b.subcategory

Strangely, my output stays the same as illustrated below - it's as if I wasn't using this scaffold:

Category Sub-category Value
A A1 1
B B2 4

I've tried changing the join to a full outer join, to no avail. Any help appreciated!


Solution

  • I would recommend cross joining the expected categories / subcategories (scaffold) with the list of distinct dates that are available in the main table.

    This gives you all possible combinations (ie all rows of the resultset); then, the main table can be brought with a left join - when the join does not match, the value end up as null:

    select d.date, s.category, s.subcategory, m.value
    from (select distinct date from main_table) d
    cross join scaffold s
    left join main_table m 
        on  m.date        = d.date 
        and m.category    = s.category
        and m.subcategory = s.subcategory
    

    Ideally you would have a separate table that stores the dates only, which we would use instead of select distinct in subquery d.