sqloraclecomparisonrow-value-expression

How to compare groups of tuples in sql


How to compare groups of tuples in sql: consider the following example:

TABLE T1
--------
GROUP     VALUE
-----     -----
A         FOO
A         BAR
X         HHH
X         ZOO

TABLE T2
--------
GROUP     VALUE
-----     -----
B         ZOO
C         FOO
C         BAR

I want to write an sql query which compares the groups of values in both tables and reports the differences. In the illustrated example, the group in table a: ((A,FOO),(A,BAR)) is the same as the group ((C,FOO),(C,BAR)) even though the group name is different. What counts is that the contents of the group are the same. Finally the query would report that there is a difference: it is the (B,ZOO) tuple.

RESULT
------
GROUP     VALUE
-----     -----
B         ZOO
X         HHH
X         ZOO

Although the group X containing ZOO in T1 has a matching value in T2: (B,ZOO) it is still not a match because the group also has the (X, HHH) value which is not part of the (B, ZOO) group in T2


Solution

  • Something like this

    create table t1 (group_id varchar2(20), value varchar2(20));
    create table t2 (group_id varchar2(20), value varchar2(20));
    
    insert into t1 values ('A','FOO');
    insert into t1 values ('A','BAR');
    insert into t1 values ('X','HHH');
    insert into t1 values ('X','ZOO');
    insert into t2 values ('C','FOO');
    insert into t2 values ('C','BAR');
    insert into t2 values ('B','ZOO');
    
    
    select t1.group_id t1_group,t2.group_id t2_group, 
          --t1.all_val, t2.all_val, 
           case when t1.all_val = t2.all_val then 'match' else 'no match' end coll_match
    from 
      (select 'T1' tab_id, group_id, collect(value) all_val, 
              min(value) min_val, max(value) max_val, count(distinct value) cnt_val 
      from t1 group by group_id) t1
    full outer join
      (select 'T2' tab_id, group_id, collect(value) all_val, 
              min(value) min_val, max(value) max_val, count(distinct value) cnt_val 
      from t2 group by group_id) t2
    on t1.min_val = t2.min_val and t1.max_val = t2.max_val and t1.cnt_val = t2.cnt_val
    /
    

    I've done a preliminary elimination based on the minmium, maximum and number of distinct values in each group, which would help with large datasets. If the datasets are small enough, you may not need them.

    That tells you the matches. You just need to push it out an extra step to find the groups that don't have any matches

    select t1_group
    from
    (
      select t1.group_id t1_group,t2.group_id t2_group, 
            --t1.all_val, t2.all_val, 
             case when t1.all_val = t2.all_val then 'match' end coll_match
      from 
        (select 'T1' tab_id, group_id, collect(value) all_val
        from t1 group by group_id) t1
      cross join
        (select 'T2' tab_id, group_id, collect(value) all_val
        from t2 group by group_id) t2
    )
    group by t1_group
    having min(coll_match) is null
    /
    
    select t2_group
    from
    (
      select t1.group_id t1_group,t2.group_id t2_group, 
            --t1.all_val, t2.all_val, 
             case when t1.all_val = t2.all_val then 'match' end coll_match
      from 
        (select 'T1' tab_id, group_id, collect(value) all_val
        from t1 group by group_id) t1
      cross join
        (select 'T2' tab_id, group_id, collect(value) all_val
        from t2 group by group_id) t2
    )
    group by t2_group
    having min(coll_match) is null
    /