sqlpostgresqlleft-joinaggregate-functions

Postgres LEFT JOIN with SUM, missing records


I am trying to get the count of certain types of records in a related table. I am using a left join.

I have a query that isn't right and one that is. The correct results query has a higher execution cost. I'd like to use the first approach, if I can correct the results.

See http://sqlfiddle.com/#!15/7c20b/5/2.

CREATE TABLE people(
  id SERIAL,
  name varchar not null
);

CREATE TABLE pets(
  id SERIAL,
  name varchar not null, 
  kind varchar not null,
  alive boolean not null default false,
  person_id integer not null
);

INSERT INTO people(name) VALUES
('Chad'),
('Buck'); --can't keep pets alive

INSERT INTO pets(name, alive, kind, person_id) VALUES
('doggio', true, 'dog', 1),
('dog master flash', true, 'dog', 1),
('catio', true, 'cat', 1),
('lucky', false, 'cat', 2);

My goal is to get a table back with ALL of the people and the counts of the KINDS of pets they have alive:

| ID | ALIVE_DOGS_COUNT | ALIVE_CATS_COUNT |
|----|------------------|------------------|
|  1 |                2 |                1 |
|  2 |                0 |                0 |
    

In our production app (not really pets) there would be about 100,000 dead dogs and cats per person. I was hoping to filter all the 'dead' stuff out before the count. I have the slower query in production.

-- Returns incorrect results. When a person has an alive pet, the records are correct, but excludes people that dont currently have
-- an alive pet, fastest of three...
SELECT 
  people.id,
  COALESCE(SUM(case when pets.kind='dog' then 1 else 0 end),0) as alive_dogs_count,
  COALESCE(SUM(case when pets.kind='cat' then 1 else 0 end),0) as alive_cats_count
FROM
  people
  LEFT JOIN pets on people.id = pets.person_id
WHERE 
  pets.alive = true
GROUP BY people.id;

-- Returns correct results, but looks like the execution plan costs a little more.
SELECT people.id,
  (SELECT COUNT(pets.id) FROM pets WHERE pets.person_id = people.id AND pets.alive = true AND pets.kind = 'dog') as alive_dogs_count,
  (SELECT COUNT(pets.id) FROM pets WHERE pets.person_id = people.id AND pets.alive = true AND pets.kind = 'cat') as alive_cats_count
FROM people;

-- Returns correct results, but looks like the execution plan costs significantly more.
SELECT 
  people.id,
  COALESCE(SUM(case when pets.alive = true AND pets.kind='dog' then 1 else 0 end),0) as alive_dogs_count,
  COALESCE(SUM(case when pets.alive = true AND pets.kind='cat' then 1 else 0 end),0) as alive_cats_count
FROM
  people
  LEFT JOIN pets on people.id = pets.person_id
GROUP BY people.id;

How do I get the LEFT JOIN version working?


Solution

  • While fetching all or most rows

    Typically fastest:

    SELECT pp.id
         , COALESCE(pt.a_dog_ct, 0) AS alive_dogs_count
         , COALESCE(pt.a_cat_ct, 0) AS alive_cats_count
    FROM   people pp
    LEFT   JOIN (
       SELECT person_id
            , count(*) FILTER (WHERE kind = 'dog') AS a_dog_ct
            , count(*) FILTER (WHERE kind = 'cat') AS a_cat_ct
       FROM   pets
       WHERE  alive
       GROUP  BY 1
       ) pt ON pt.person_id = pp.id;
    

    About the aggregate FILTER clause:

    Indexes are irrelevant here, full table scans will be fastest. Except if alive is a rare case, then a partial index should help. Like:

    CREATE INDEX pets_alive_idx ON pets (person_id, kind) WHERE alive;
    

    I included all columns needed for the query (person_id, kind) to allow index-only scans.

    fiddle
    Old sqlfiddle

    While fetching a small subset

    Typically fastest:

    SELECT pp.id
         , count(kind = 'dog' OR NULL) AS alive_dogs_count
         , count(kind = 'cat' OR NULL) AS alive_cats_count
    FROM   people pp
    LEFT   JOIN pets pt ON pt.person_id = pp.id
                       AND pt.alive
    WHERE  <some condition to retrieve a small subset>
    GROUP  BY 1;
    

    You should at least have an index on pets(person_id) for this (or the partial index from above) - and possibly more, depending on the WHERE condition.

    Related: