sqloracle-databasejoinoracle11granking

Pull data based on state,city,county setup


Hoping someone can help with the below query where I am currently stuck.

I have a setup table (based on state, city, and county) where * in the setup represents "all" else it represents the specific value it holds. I need to extract data from the main table joining with setup table so that it only pulls population as defined in the setup table.

State City County
AZ * *
CA * Los Angeles
CA San Diego *
CA San Francisco *
CA Santa Monica *
NY * *
NY * Bronx
NY * Kings

The main table is the "person" table which has the state, city and county columns with actual person location values. I need to be able to join this with the setup table so that it only pulls the applicable rows, where * in setup table represents "all". State will always have a valid value and will never be *

Based on the above setup table, the query should pull

a) All from state NY and AZ (As it has * in both city and county)

b) from CA - all belonging to city San Diego, San Francisco and Santa Monica and all belonging to County Los Angeles.

I am getting duplicates in the current query (below). I have tried a few options like using a distinct in the final query to weed out duplicates or looking at anything else in the data that can be used instead.

SELECT  
    e.emplid,
    e.first_name,
    e.last_name ,
    e.state,
    e.county,
    e.city
   
FROM
    person e
    
    JOIN setup S 
       ON e.state = S.State
          AND (e.city = S.city OR S.city = '*') 
          AND (e.county = S.county OR S.county = '*')
          AND sysdate between S.eff_dt and S.end_eff_dt    
WHERE
    sysdate between e.eff_dt and e.end_eff_dt 
        AND e.hr_Status <> 'T'
        AND e.STAFF = 'N'
    And e.union = 'NU'
    AND e.country = 'USA';

Solution

  • The duplicates exist because the setup table have overlapping natural ranges and your join syntax is allowing matches on multiple setup records.

    To explain, focus on this setup data:

    State City County
    NY * *
    NY * Bronx
    NY * Kings

    That first NY row says to match Any city and Any county in NY. But if you have a person who is also from Kings, then their record will match 2 of the setup rows.

    Using DISTINCT is a possible solution to this, but will not work if you expand the records to include the matched columns in the Setup table. The following should work

    SELECT DISTINCT e.emplid, e.first_name, e.last_name, e.city, e.county, e.state
    FROM
        person e
        
        JOIN setup S 
           ON e.state = S.State
              AND (e.city = S.city OR S.city = '*') 
              AND (e.county = S.county OR S.county = '*')
    

    In application design, you might choose to be pragmatic and put checks to prevent the user creating multiple overlapping criteria so that you can avoid DISTINCT however there are US Cities that legitimately span multiple Counties so enforcing cleaner data might be more effort than it is worth.

    The other approach is to use EXISTS instead of joining, this is especially useful if you do NOT need a reference to the specific record from Setup that was matched.

    SELECT e.EmplId, e.Name, e.State, e.City, e.County
    FROM
        person e
    WHERE EXISTS (SELECT 1 FROM setup S   
                  WHERE e.state = S.State
                  AND (e.city = S.city OR S.city = '*') 
                  AND (e.county = S.county OR S.county = '*')
                  )
    

    Proof and play with example data here: https://www.db-fiddle.com/f/uZ2AH8xwyb34HNhBEgtzZW/0