The below code (Databricks SQL) produces the table following it. I am trying to adjust this code so that the output only includes zip5 records that have only 1 (or less) of each facility_type associated with it. Facility_type has 4 possible values (Hospital, ASC, Other, and null). In the table below, I want zip5 (10003) to be output, as it has only 1 of each of it's associated facility_types. Zip5 10016 would not be output, as it has 2 Hospital values. Zip5 10021 has 3 values with a Hospital facility_type, so it would also not be output. Zip5 10025 and 10029 would both be output.
I've tried using different having statements, but they all have allowed some unwanted zip5's to sneak into the output. For example, the following statement allows zip5 10016 into the output.
How can I achieve what I need to here? Is the having statement not the way to go?
HAVING (COUNT(DISTINCT ok.facility_type) <= 1 and count(distinct a.org_id) <=1)
SELECT a.zip5, a.org_id, ok.facility_type
FROM sales_table a
LEFT JOIN (SELECT ok.org_id,
CASE WHEN cot.COT_DESC IN ('Outpatient') THEN 'ASC'
WHEN cot.cot_desc IN ('Hospital') THEN cot.cot_desc
ELSE 'Other'
END AS facility_type
FROM ref_table1 ok
LEFT JOIN ref_table2 cot ON ok.ID = cot.ID) ok ON a.org_id = ok.org_id
GROUP BY a.zip5, a.org_id, ok.facility_type
zip5 | org_id | facility_type |
---|---|---|
10003 | 845307 | Other |
10003 | 001564 | Hospital |
10003 | 006054 | null |
10016 | 932258 | Hospital |
10016 | 005484 | Hospital |
10016 | null | null |
10016 | 584790 | ASC |
10021 | 005491 | Hospital |
10021 | 005154 | Hospital |
10021 | 002166 | Hospital |
10021 | null | null |
10025 | 001565 | Hospital |
10029 | 005425 | Other |
10029 | 005483 | Hospital |
-- Step 1: Calculate the counts of each facility_type per zip5
WITH facility_counts AS (
SELECT
a.zip5,
ok.facility_type,
COUNT(a.org_id) AS facility_count
FROM sales_table a
LEFT JOIN (
SELECT
ok.org_id,
CASE
WHEN cot.COT_DESC IN ('Outpatient') THEN 'ASC'
WHEN cot.cot_desc IN ('Hospital') THEN cot.cot_desc
ELSE 'Other'
END AS facility_type
FROM ref_table1 ok
LEFT JOIN ref_table2 cot ON ok.ID = cot.ID
) ok ON a.org_id = ok.org_id
GROUP BY a.zip5, ok.facility_type
)
-- Step 2: Filter zip5s where any facility_type count is greater than 1
, filtered_zip5s AS (
SELECT zip5
FROM facility_counts
WHERE facility_count > 1
GROUP BY zip5
)
-- Step 3: Select zip5s that do not appear in the filtered_zip5s
SELECT a.zip5, a.org_id, ok.facility_type
FROM sales_table a
LEFT JOIN (
SELECT
ok.org_id,
CASE
WHEN cot.COT_DESC IN ('Outpatient') THEN 'ASC'
WHEN cot.cot_desc IN ('Hospital') THEN cot.cot_desc
ELSE 'Other'
END AS facility_type
FROM ref_table1 ok
LEFT JOIN ref_table2 cot ON ok.ID = cot.ID
) ok ON a.org_id = ok.org_id
WHERE a.zip5 NOT IN (SELECT zip5 FROM filtered_zip5s)
GROUP BY a.zip5, a.org_id, ok.facility_type
ORDER BY a.zip5, a.org_id;