sqlcognos-11

Remove duplicates from single field only in rollup query


I have a table of data for individual audits on inventory. Every audit has a location, an expected value, a variance value, and some other data that aren't really important here.

I am writing a query for Cognos 11 which summarizes a week of these audits. Currently, it rolls everything up into sums by location class. My problem is that there may be multiple audits for individual locations and while I want the variance field to sum the data from all audits regardless of whether it's the first count on that location, I only want the expected value for distinct locations (i.e. only SUM expected value where the location is distinct).

Below is a simplified version of the query. Is this even possible or will I have to write a separate query in Cognos and make it two reports that will have to be combined after the fact? As you can likely tell, I'm fairly new to SQL and Cognos.

SELECT COALESCE(CASE 
                WHEN location_class = 'A'
                    THEN 'Active'
                WHEN location_class = 'C'
                    THEN 'Active'
                WHEN location_class IN (
                        'R'
                        ,'0'
                        )
                    THEN 'Reserve'
                END, 'Grand Total') "Row Labels"
        ,SUM(NVL(expected_cost, 0)) "Sum of Expected Cost"
        ,SUM(NVL(variance_cost, 0)) "Sum of Variance Cost"
        ,SUM(ABS(NVL(variance_cost, 0))) "Sum of Absolute Cost"
        ,COUNT(DISTINCT location) "Count of Locations"
        ,(SUM(NVL(variance_cost, 0)) / SUM(NVL(expected_cost, 0))) "Variance"
    FROM audit_table
    WHERE audit_datetime <= #prompt('EndDate') # audit_datetime >= #prompt('StartDate') #
    GROUP BY ROLLUP(CASE 
                WHEN location_class = 'A'
                    THEN 'Active'
                WHEN location_class = 'C'
                    THEN 'Active'
                WHEN location_class IN (
                        'R'
                        ,'0'
                        )
                    THEN 'Reserve'
                END)
    ORDER BY 1 ASC

This is what I'm hoping to end up with:

End Goal

Thanks for any help!


Solution

  • Have you tried taking a look at the OVER clause in SQL? It allows you to use windowed functions within a result set such that you can get aggregates based on specific conditions. This would probably help since you seem to trying to get a summation of data based on a different grouping within a larger grouping.

    For example, let's say we have the below dataset:

    group1      group2      val         dateadded
    ----------- ----------- ----------- -----------------------
    1           1           1           2020-11-18
    1           1           1           2020-11-20
    1           2           10          2020-11-18
    1           2           10          2020-11-20
    2           3           100         2020-11-18
    2           3           100         2020-11-20
    2           4           1000        2020-11-18
    2           4           1000        2020-11-20
    

    Using a single query we can return both the sums of "val" over "group1" as well as the summation of the first (based on datetime) "val" records in "group2":

    declare @table table (group1 int, group2 int, val int, dateadded datetime)
    insert into @table values (1, 1, 1, getdate())
    insert into @table values (1, 1, 1, dateadd(day, 1, getdate()))
    insert into @table values (1, 2, 10, getdate())
    insert into @table values (1, 2, 10, dateadd(day, 1, getdate()))
    insert into @table values (2, 3, 100, getdate())
    insert into @table values (2, 3, 100, dateadd(day, 1, getdate()))
    insert into @table values (2, 4, 1000, getdate())
    insert into @table values (2, 4, 1000, dateadd(day, 1, getdate()))
    
    select t.group1, sum(t.val) as group1_sum, group2_first_val_sum
    from @table t
    inner join
    (
        select group1, sum(group2_first_val) as group2_first_val_sum
        from
        (
            select group1, val as group2_first_val, row_number() over (partition by group2 order by dateadded) as rownumber
            from @table
        ) y
        where rownumber = 1
        group by group1
        
    ) x on t.group1 = x.group1
    group by t.group1, x.group2_first_val_sum
    

    This returns the below result set:

    group1      group1_sum  group2_first_val_sum
    ----------- ----------- --------------------
    1           22          11
    2           2200        1100
    

    The most inner subquery in the joined table numbers the rows in the data set based on "group2", resulting in the records either having a "1" or a "2" in the "rownum" column since there's only 2 records in each "group2".

    The next subquery takes that data and filters out any rows that are not the first (rownum = 1) and sums the "val" data.

    The main query gets the sum of "val" in each "group1" from the main table and then joins on the subqueried table to get the "val" sum of only the first records in each "group2".

    There are more efficient ways to write this such as moving the summation of the "group1" values to a subquery in the SELECT statement to get rid of one of the nested tabled subqueries, but I wanted to show how to do it without subqueries in the SELECT statement.