sqloracle-databasejoinplsqlfull-outer-join

Generate Result based on max count in secondary column after a join


I have two tables which have a common key between them, and quite a lot of other important infos ; for the sake of simplicity i will be using Combination A and Combination B. When a combination is met, whichever table has the maximum number of records should be the source where i collect the information ; in this case say IDs. The priority when counts are same is Table1.

COMMONKEY column is the combination/join condition in my tables.

    (Table 1)

  SELECT '123' table1_id,'Comb A' commonkey from dual UNION
  SELECT '124' table1_id,'Comb A' commonkey from dual UNION
  SELECT '125' table1_id,'Comb A' commonkey from dual UNION
  SELECT '126' table1_id,'Comb A' commonkey from dual UNION
  SELECT '215' table1_id,'Comb B' commonkey from dual UNION
  SELECT '216' table1_id,'Comb B' commonkey from dual UNION
  SELECT '559' table1_id,'Random Combination 1' commonkey from dual UNION
  SELECT '560' table1_id,'Random Combination 2' commonkey from dual ;   
                                 
    ( Table 2 )     
        
  SELECT 'abc1' table2_id,'Comb A' commonkey from dual  UNION
  SELECT 'abc2' table2_id,'Comb A' commonkey from dual  UNION
  SELECT 'abc3' table2_id,'Comb A' commonkey from dual  UNION
  SELECT 'abc4' table2_id,'Comb A' commonkey from dual  UNION
  SELECT 'xyz1' table2_id,'Comb B' commonkey from dual  UNION
  SELECT 'xyz2' table2_id,'Comb B' commonkey from dual  UNION
  SELECT 'xyz3' table2_id,'Comb B' commonkey from dual  UNION
  SELECT 'xyz2' table2_id,'Comb B' commonkey from dual  UNION 
  SELECT '416abc1' table2_id,'Random Combination 91' commonkey from dual UNION
  SELECT '416abc2' table2_id,'Random Combination 92' commonkey from dual;
  
    
    

Result Set Expected :

ID        COMMONKEY         
123       Comb A            
124       Comb A            
125       Comb A            
126       Comb A            
xyz1      Comb B            
xyz2      Comb B            
xyz3      Comb B            
559       Random Combination 1          
560       Random Combination 1          
416abc1   Random Combination 91         
416abc2   Random Combination 92 

Updated Image :

( the image shows a screenshot of the trail data in an excel; The Requirement and Strategy are color mapped to make it quickly understandable )

I need to generate the result set using SQL as follows :

When table1.commonkey = table2.commonkey hits, I need to-

Edit : I've initially gone on routes with

  a left join b where b.key IS null ;
  a full outer join b where b.key IS NULL or a.key is NULL ;

to achieve workarounds with A-B or B-A result sets but both these approaches were quite wrong. Gathering Delta sets or Exclusion sets didnt go well.


Solution

  • Here's one option; see comments within code

    SQL> with
      2  -- sample data
      3  a (id, ckey) as
      4    (select '123', 'ca' from dual union all
      5     select '124', 'ca' from dual union all
      6     select '125', 'ca' from dual union all
      7     select '126', 'ca' from dual union all
      8     select '215', 'cb' from dual union all
      9     select '216', 'cb' from dual union all
     10     select '551', 'r1' from dual union all
     11     select '552', 'r2' from dual
     12    ),
     13  b (id, ckey) as
     14    (select 'abc1', 'ca' from dual union all
     15     select 'abc2', 'ca' from dual union all
     16     select 'abc3', 'ca' from dual union all
     17     select 'abc4', 'ca' from dual union all
     18     select 'xyz1', 'cb' from dual union all
     19     select 'xyz2', 'cb' from dual union all
     20     select 'xyz3', 'cb' from dual union all
     21     select '9991', 'r3' from dual union all
     22     select '9992', 'r4' from dual
     23    ),
    

     24  -- count rows per each CKEY (common key)
     25  tempa as
     26    (select id, ckey, count(*) over (partition by ckey) cnt
     27     from a
     28    ),
     29  tempb as
     30    (select id, ckey, count(*) over (partition by ckey) cnt
     31     from b
     32    )
     33  -- final query
     34  select distinct
     35    case when a.cnt >= b.cnt then a.id
     36         else b.id
     37    end id,
     38    a.ckey
     39  from tempa a join tempb b on b.ckey = a.ckey
     40  union all
     41  select ckey, id from a
     42    where not exists (select null from b where a.ckey = b.ckey)
     43  union all
     44  select ckey, id from b
     45    where not exists (select null from a where a.ckey = b.ckey)
     46  order by 1, 2;
    

    which results in

    ID   CKEY
    ---- -----
    r1   551
    r2   552
    r3   9991
    r4   9992
    xyz1 cb
    xyz2 cb
    xyz3 cb
    123  ca
    124  ca
    125  ca
    126  ca
    
    11 rows selected.
    
    SQL>