sqlsnowflake-cloud-data-platformfull-outer-join

Snowflake SQL Compare Clone x Original Counts


I am trying to compare 2 tables (cloned table x original) using Full Outer Join on Snowflake but not getting the results I am expecting:

select 
    'invoice' as TABLE_NAME, 
    to_date(prd.date) as TRANSACTION_DATE,
    count(prd.date) as PROD_COUNT,
  from invoice_latest prd
full outer join
(
    select 
    'invoice_clone' as TABLE_NAME, 
    to_date(date) AS TRANSACTION_DATE,
    count(1) as CLONE_COUNT
    from invoice_latest_clone
    where to_date(date) >= '2024-04-01'
    group by to_date(date)
    order by to_date(date)
) cln
on to_date(prd.date) = cln.TRANSACTION_DATE
where to_date(prd.date) >= '2024-04-02' OR cln.TRANSACTION_DATE is not null
group by to_date(prd.date)
order by to_date(prd.date)

I expect to have NULL for the counts related to the original table for the 1st April but getting the below instead:

enter image description here


Solution

  • The problem is that you are joining and then filtering the results of the join, and then counting.

    You should filter first and then do the join.

    A query similar to yours that gives similar bad result:

    select 'invoice' as table_name
        , o_orderdate as transaction_date
        , count(prd.o_orderdate) as prod_count
        , max(clone_count)
    from snowflake_sample_data.tpch_sf1.orders prd
    full outer join (
        select 'invoice_clone' as table_name
            , o_orderdate as transaction_date
            , count(1) as clone_count
            from snowflake_sample_data.tpch_sf1.orders 
            where o_orderdate >= '1998-07-29'
            group by o_orderdate
            order by o_orderdate        
    ) cln
    on prd.o_orderdate = cln.transaction_date
    where o_orderdate >= '1998-07-30' or cln.transaction_date is not null
    group by prd.o_orderdate
    order by 2
    

    enter image description here

    Query fixed:

    select 'invoice' as table_name
        , transaction_date
        , count(prd.o_orderdate) as prod_count
        , max(clone_count)
    from (
        select *
        from snowflake_sample_data.tpch_sf1.orders 
        where o_orderdate >= '1998-07-30'
    ) prd
    full outer join (
        select 'invoice_clone' as table_name
            , o_orderdate as transaction_date
            , count(1) as clone_count
            from snowflake_sample_data.tpch_sf1.orders 
            where o_orderdate >= '1998-07-29'
            group by o_orderdate
            order by o_orderdate        
    ) cln
    on prd.o_orderdate = cln.transaction_date
    where cln.transaction_date is not null
    group by transaction_date
    order by 2
    

    enter image description here