Hoping someone can help with the below query where I am currently stuck.
I have a setup table (based on state, city, and county) where * in the setup represents "all" else it represents the specific value it holds. I need to extract data from the main table joining with setup table so that it only pulls population as defined in the setup table.
State | City | County |
---|---|---|
AZ | * | * |
CA | * | Los Angeles |
CA | San Diego | * |
CA | San Francisco | * |
CA | Santa Monica | * |
NY | * | * |
NY | * | Bronx |
NY | * | Kings |
The main table is the "person" table which has the state, city and county columns with actual person location values. I need to be able to join this with the setup table so that it only pulls the applicable rows, where * in setup table represents "all". State will always have a valid value and will never be *
Based on the above setup table, the query should pull
a) All from state NY and AZ (As it has * in both city and county)
b) from CA - all belonging to city San Diego, San Francisco and Santa Monica and all belonging to County Los Angeles.
I am getting duplicates in the current query (below). I have tried a few options like using a distinct in the final query to weed out duplicates or looking at anything else in the data that can be used instead.
SELECT
e.emplid,
e.first_name,
e.last_name ,
e.state,
e.county,
e.city
FROM
person e
JOIN setup S
ON e.state = S.State
AND (e.city = S.city OR S.city = '*')
AND (e.county = S.county OR S.county = '*')
AND sysdate between S.eff_dt and S.end_eff_dt
WHERE
sysdate between e.eff_dt and e.end_eff_dt
AND e.hr_Status <> 'T'
AND e.STAFF = 'N'
And e.union = 'NU'
AND e.country = 'USA';
The duplicates exist because the setup table have overlapping natural ranges and your join syntax is allowing matches on multiple setup records.
To explain, focus on this setup data:
State | City | County |
---|---|---|
NY | * | * |
NY | * | Bronx |
NY | * | Kings |
That first NY row says to match Any city and Any county in NY. But if you have a person who is also from Kings, then their record will match 2 of the setup rows.
Using DISTINCT
is a possible solution to this, but will not work if you expand the records to include the matched columns in the Setup
table. The following should work
SELECT DISTINCT e.emplid, e.first_name, e.last_name, e.city, e.county, e.state
FROM
person e
JOIN setup S
ON e.state = S.State
AND (e.city = S.city OR S.city = '*')
AND (e.county = S.county OR S.county = '*')
In application design, you might choose to be pragmatic and put checks to prevent the user creating multiple overlapping criteria so that you can avoid
DISTINCT
however there are US Cities that legitimately span multiple Counties so enforcing cleaner data might be more effort than it is worth.
The other approach is to use EXISTS
instead of joining, this is especially useful if you do NOT need a reference to the specific record from Setup
that was matched.
SELECT e.EmplId, e.Name, e.State, e.City, e.County
FROM
person e
WHERE EXISTS (SELECT 1 FROM setup S
WHERE e.state = S.State
AND (e.city = S.city OR S.city = '*')
AND (e.county = S.county OR S.county = '*')
)
Proof and play with example data here: https://www.db-fiddle.com/f/uZ2AH8xwyb34HNhBEgtzZW/0