I got the following three tables in BigQuery along with the data below.
I'd like to write a SQL query using id and current date columns and get the following row as the output.
Expected result set:
id existing_stores missing_stores
1003812607 "3640,0130,0131,2306,3638,0127,2789,2305" "3102,2681,2686,2670,2682,3101,2673,2669,3103,2668"
These are the tables:
item
table:
id date
------------------------
1003812607 2023-08-03
1003812607 2023-08-01
1003812607 2023-07-23
1003812607 2023-06-30
item_change_history
:
createdTime docType id existing_stores
---------------------------------------------------------------------------------------------
2023-08-03 11:01:10.139617 UTC Item 1003812607 "3640,0130,0131,2306,3638,0127,2789,2305"
2023-07-01 09:01:10.139617 UTC Item 1003812607 "3640,0130,0131,2306,3638,0127,2789,2301"
mic_item_discrepancy
:
ID MISSING_STORE
-------------------------
1003812607 3102
1003812607 2681
1003812607 2686
1003812607 2670
1003812607 2682
1003812607 3101
1003812607 2673
1003812607 2669
1003812607 3103
1003812607 2668
I tried to come up with this query and it is not working as expected or giving me the wrong data given that duplicate id rows in
mic_item_discrepancy
table.
SELECT
item.id, ich.id, ich.existing_stores, mid.missing_stores
FROM
`hd-merch-prod.merch_item_cache_validation.item` item
JOIN
`hd-merch-prod.merch_item_cache.item_change_history` ich
ON item.date = "2023-08-03"
AND DATE(ich.createdTime) = item.date
AND ich.id = item.id
JOIN
`hd-merch-prod.merch_item_cache.mic_item_discrepancy` mid
ON item.id = mid.id
GROUP BY
item.id, ich.id, ich.existing_stores, mid.missing_stores;
Try to use LEFT JOIN
for both the item_change_history
and mic_item_discrepancy
tables to ensure that all rows from the item table are included in the result, and add a DISTINCT
in the STRING_AGG
function:
SELECT
item.id, ich.id, ich.existing_stores, COALESCE(mid.missing_stores, '') AS missing_stores
FROM
`hd-merch-prod.merch_item_cache_validation.item` item
LEFT JOIN
(
SELECT id, existing_stores, MAX(createdTime) as latest_createdTime
FROM
`hd-merch-prod.merch_item_cache.item_change_history`
WHERE
DATE(createdTime) = '2023-08-03'
GROUP BY
id, existing_stores
) ich ON item.id = ich.id
LEFT JOIN
(
SELECT ID, STRING_AGG(DISTINCT MISSING_STORE, ',') AS missing_stores
FROM
`hd-merch-prod.merch_item_cache.mic_item_discrepancy`
GROUP BY
ID
) mid ON item.id = mid.ID;