sqlpostgresqlviewleft-joinaggregate

Create view with aggregated columns from three levels of nested tables


I'm using Postgres 9.4 and I have 3 tables: truck , container and container_metadata.
A truck can have many container and a container many container_metadata.

I'll add more description later with the tables below (I did my best creating them), so here goes:

enter image description here

A truck can contain many containers referred by key truck_id

enter image description here enter image description here

A container then is described by the container_metadata table, a container can also have many records of different types, but in this case, I will be focusing only in 'subcontainer' type.

The problem I'm trying to solve is to create a view for ease of querying, and model creation (to be used for databrowser). I'll add more details below.

enter image description here

In the last column, all good is true if a truck has:

More details:

I have only been able to count the containers, subcontainers I've been trying out aggregate functions, but I'm new to this problem, I hope the tables help. Please do ask if it needs more clarification.

Query

My attempt so far:

select 
t.id,t.name, count(c.id) as container_count, count(cm.id) as subcontainer_count
from
    public.truck t
left join
    public.container c
on
    c.truck_id = t.id
left join
    public.container_metadata cm
on
    cm.container_id = c.id and type = 'subcontainer'
group by t.id
;

Assuming schema is public. Above is the query I've tried but it outputs wrong number of subcontainers, and that's how far I've got.


Solution

  • Assuming container_metadata.volume to be NOT NULL DEFAULT 0, this should do the complete job:

    SELECT t.id, t.name
         , COALESCE(c.cont_ct, 0)      AS cont_ct
         , COALESCE(c.sub_ct, 0)       AS sub_ct
         , COALESCE(c.empty_ct, 0)     AS empty_ct
         , c.truck_id IS NULL          AS cont_missing
         , c.sub_missing IS NOT FALSE  AS sub_missing
         ,(c.empty_ct = 0) IS NOT TRUE AS sub_needfill
         , c.empty_ct = 0 AND NOT c.sub_missing AS all_good
    FROM   truck t
    LEFT   JOIN (
       SELECT truck_id
            , count(*) AS cont_ct
            , sum(cm.ct_sub) AS sub_ct
            , sum(cm.ct_empty) AS empty_ct
            , bool_or(cm.container_id IS NULL) AS sub_missing
       FROM   container c
       LEFT   JOIN (
          SELECT container_id
               , count(*) AS ct_sub
               , count(*) FILTER (WHERE volume = 0) AS ct_empty
          FROM   container_metadata
          WHERE  type = 'subcontainer'  -- only those seem relevant
          GROUP  BY 1
          ) cm ON cm.container_id = c.id
       GROUP  BY 1
       ) c ON c.truck_id = t.id;
    

    The major feature is to aggregate first and then join to the next upper level. Not only is it typically faster when processing most or all of the table anyway, it also allows to aggregate each level in one fell swoop:

    Be wary of null values. Those can be introduced by column values or by the LEFT JOIN (missing rows). It is essential to observe which columns can be null. The query is only valid for matching table definitions.

    Be sure to understand logical and comparison operators, in particular involving null values.

    The aggregate FILTER clause requires Postgres 9.4: