sqlpostgresqlouter-joincuberolap

CUBE + outer join = extra NULL row


When I use PostgreSQL's CUBE on a query with an OUTER JOIN, I get an extra all-NULL row that can't be distinguished from the cube's own "everything combined" all-NULL result.

CREATE TABLE species
  ( id    SERIAL PRIMARY KEY,
    name  TEXT);

CREATE TABLE pet
  ( species_id INTEGER REFERENCES species(id),
    is_adult   BOOLEAN, 
    number     INTEGER)
;

INSERT INTO species VALUES
  (1, 'cat'), (2, 'dog');

INSERT INTO pet VALUES
  (1, true, 3), (1, false, 1), (2, true, 1), (null, true, 2);

OK, so there are 7 pets total:

SELECT SUM(number) FROM pet;
 sum
-----
   7
(1 row)

Now look at the total row of the cube:

SELECT * FROM (
        SELECT name, is_adult, SUM(number)
        FROM   pet p
        JOIN   species s ON (p.species_id = s.id)
        GROUP BY CUBE (name, is_adult)) subq
WHERE name IS NULL
AND   is_adult IS NULL;

 name | is_adult | sum
------+----------+-----
      |          |   5
(1 row)

5 pets? Oh, right, because the no-species pets aren't being included. I need an outer join.

SELECT * FROM (
        SELECT name, is_adult, SUM(number)
        FROM   pet p
        LEFT OUTER JOIN   species s ON (p.species_id = s.id)
        GROUP BY CUBE (name, is_adult)) subq
WHERE name IS NULL
AND   is_adult IS NULL;

 name | is_adult | sum 
------+----------+-----
      |          |   2
      |          |   7
(2 rows)

My cube has 2 all-null rows; the second one is the answer I wanted.

I half-understand what is going on here: NULL values are used to signal two different things ("the cube has rolled up all this column's values" or "this row has no children in the right-side table"). I just don't know how to fix it.


Solution

  • NULL values are used to signal two different things ("the cube has rolled up all this column's values" or "this row has no children in the right-side table").

    In order to distinguish one null from the other null, you can use grouping(...) function, see table 9-55 here: https://www.postgresql.org/docs/9.6/static/functions-aggregate.html#FUNCTIONS-GROUPING-TABLE

    GROUPING(args...) Integer bit mask indicating which arguments are not being included in the current grouping set

    Grouping operations are used in conjunction with grouping sets (see Section 7.2.4) to distinguish result rows. The arguments to the GROUPING operation are not actually evaluated, but they must match exactly expressions given in the GROUP BY clause of the associated query level. Bits are assigned with the rightmost argument being the least-significant bit; each bit is 0 if the corresponding expression is included in the grouping criteria of the grouping set generating the result row, and 1 if it is not.


     name | is_adult | sum 
    ------+----------+-----
          |          |   2
          |          |   7
    

    the second one is the answer I wanted.

    Try this:

    SELECT name, is_adult, SUM(number)
    FROM   pet p
    LEFT OUTER JOIN   species s ON (p.species_id = s.id)
    GROUP BY CUBE (name, is_adult)
    HAVING grouping(name,is_adult) = 3
    
    name |is_adult |sum  |
    -----|---------|-----|
         |         |7    |
    

    Please also examine this query to learn how grouping function works:

    SELECT name, is_adult, SUM(number), grouping(name,is_adult)
    FROM   pet p
    LEFT OUTER JOIN   species s ON (p.species_id = s.id)
    GROUP BY CUBE (name, is_adult)
    
    name |is_adult |sum |grouping |
    -----|---------|----|---------|
    cat  |false    |1   |0        |
    cat  |true     |3   |0        |
    cat  |         |4   |1        |
    dog  |true     |1   |0        |
    dog  |         |1   |1        |
         |true     |2   |0        |
         |         |2   |1        |
         |         |7   |3        |
         |false    |1   |2        |
         |true     |6   |2        |