I am trying to compare 2 tables (cloned table x original) using Full Outer Join on Snowflake but not getting the results I am expecting:
select
'invoice' as TABLE_NAME,
to_date(prd.date) as TRANSACTION_DATE,
count(prd.date) as PROD_COUNT,
from invoice_latest prd
full outer join
(
select
'invoice_clone' as TABLE_NAME,
to_date(date) AS TRANSACTION_DATE,
count(1) as CLONE_COUNT
from invoice_latest_clone
where to_date(date) >= '2024-04-01'
group by to_date(date)
order by to_date(date)
) cln
on to_date(prd.date) = cln.TRANSACTION_DATE
where to_date(prd.date) >= '2024-04-02' OR cln.TRANSACTION_DATE is not null
group by to_date(prd.date)
order by to_date(prd.date)
I expect to have NULL for the counts related to the original table for the 1st April but getting the below instead:
The problem is that you are joining and then filtering the results of the join, and then counting.
You should filter first and then do the join.
A query similar to yours that gives similar bad result:
select 'invoice' as table_name
, o_orderdate as transaction_date
, count(prd.o_orderdate) as prod_count
, max(clone_count)
from snowflake_sample_data.tpch_sf1.orders prd
full outer join (
select 'invoice_clone' as table_name
, o_orderdate as transaction_date
, count(1) as clone_count
from snowflake_sample_data.tpch_sf1.orders
where o_orderdate >= '1998-07-29'
group by o_orderdate
order by o_orderdate
) cln
on prd.o_orderdate = cln.transaction_date
where o_orderdate >= '1998-07-30' or cln.transaction_date is not null
group by prd.o_orderdate
order by 2
Query fixed:
select 'invoice' as table_name
, transaction_date
, count(prd.o_orderdate) as prod_count
, max(clone_count)
from (
select *
from snowflake_sample_data.tpch_sf1.orders
where o_orderdate >= '1998-07-30'
) prd
full outer join (
select 'invoice_clone' as table_name
, o_orderdate as transaction_date
, count(1) as clone_count
from snowflake_sample_data.tpch_sf1.orders
where o_orderdate >= '1998-07-29'
group by o_orderdate
order by o_orderdate
) cln
on prd.o_orderdate = cln.transaction_date
where cln.transaction_date is not null
group by transaction_date
order by 2