oracle-databaseanalyticsminimum-size

Oracle Analytic Question


Given a function zipdistance(zipfrom,zipto) which calculates the distance (in miles) between two zip codes and the following tables:

create table zips_required(
   zip varchar2(5)
);

create table zips_available(
   zip varchar2(5),
   locations number(100)
);

How can I construct a query that will return to me each zip code from the zips_required table and the minimum distance that would produce a sum(locations) >= n.

Up till now we've just run an exhaustive loop querying for each radius until we've met the criteria.

--Do this over and over incrementing the radius until the minimum requirement is met
select count(locations) 
from zips_required zr 
left join zips_available za on (zipdistance(zr.zip,za.zip)< 2) -- Where 2 is the radius

This can take a while on a large list. It feels like this could be done with an oracle analytic query along the lines of:

min() over (
  partition by zips_required.zip 
  order by zipdistance( zips_required.zip, zips_available.zip)
  --range stuff here?
) 

The only analytic queries I have done have been "row_number over (partition by order by)" based, and I'm treading into unknown areas with this. Any guidance on this is greatly appreciated.


Solution

  • This is what I came up with :

    SELECT zr, min_distance
      FROM (SELECT zr, min_distance, cnt, 
                   row_number() over(PARTITION BY zr ORDER BY min_distance) rnk
               FROM (SELECT zr.zip zr, zipdistance(zr.zip, za.zip) min_distance,
                             COUNT(za.locations) over(
                                 PARTITION BY zr.zip 
                                 ORDER BY zipdistance(zr.zip, za.zip)
                             ) cnt
                        FROM zips_required zr
                       CROSS JOIN zips_available za)
              WHERE cnt >= :N)
     WHERE rnk = 1
    
    1. For each zip_required calculate the distance to the zip_available and sort them by distance
    2. For each zip_required the count with range allows you to know how many zip_availables are in the radius of that distance.
    3. filter (first where COUNT(locations) > N)

    I used to create sample data:

    INSERT INTO zips_required
       SELECT to_char(10000 + 100 * ROWNUM) FROM dual CONNECT BY LEVEL <= 5;
    
    INSERT INTO zips_available
       (SELECT to_number(zip) + 10 * r, 100 - 10 * r FROM zips_required, (SELECT ROWNUM r FROM dual CONNECT BY LEVEL <= 9));
    
    CREATE OR REPLACE FUNCTION zipdistance(zipfrom VARCHAR2,zipto VARCHAR2) RETURN NUMBER IS
    BEGIN
       RETURN abs(to_number(zipfrom) - to_number(zipto));
    END zipdistance;
    /
    

    Note: you used COUNT(locations) and SUM(locations) in your question, I assumed it was COUNT(locations)