sqlpostgresql

How can I find the highest ancestor in a tree?


I have the following 2 SQL tables

CREATE TABLE tag_hierarchies (
    ancestor_id integer NOT NULL,
    descendant_id integer NOT NULL,
    generations integer NOT NULL
);

Where generation 0 represents a root

CREATE TABLE tags (
    id BIGSERIAL PRIMARY KEY
    name VARCHAR
);

I'm unable to generate a query that would return the lowest generation (ie. highest ancestor) per tree. Example trees: a/b/c/d and e would be represented in the database as:

tag_hierarchies

ancestor_id descendant_id generation
84 84 0
85 85 0
84 85 1
86 86 0
85 86 1
84 86 2
87 87 0
86 87 1
85 87 2
84 87 3
88 88 0

and

tags

id name
84 a
85 b
86 c
87 d
88 e

I want to run a query that would return the unique lowest generation (ie. highest ancestor) per tree. So this query will get a base scope of tags and return unique tags. As an example if I run it with b, c & e it would return b & e. Since b has a lower generation between b & c and e is already a root.


Solution

  • Assuming that

    "...lowest generation (ie. highest ancestor) per tree ..."

    means how deep in hierarchy id appears as descendant, then you could create a cte (grid) to get the lowest generation per tree:

    Updated code (after clarifications in the comments):

    WITH
      grid AS
        ( Select     t.id, t.name, 
                     Max(h.generations) as lowest_generation,
                     Case When Max(h.generations) > 0 Then Min(h.ancestor_id) Else t.id End as root_ancestor
          From       tags t
          Inner Join tag_hierarchies h ON( h.descendant_id = t.id )
          Group By   t.id, t.name
        ) 
    --    Checking grid resultset 
    Select     g.*
    From       grid  g
    Order By   g.lowest_generation, g.id
    
    /*      R e s u l t :
    id  name    lowest_generation  root_ancestor
    --  ------  -----------------  -------------
    84  a                       0             84
    88  e                       0             88
    85  b                       1             84
    86  c                       2             84
    87  d                       3             84        */
    

    ... now if you want to do it on specific tags as mentioned in the question ...

    As an example if I run it with b, c & e it would return b & e. Since b has a lower generation between b & c and e is already a root.

    ... then put your tags (b, c, e) as filter in Where clause of the grid cte and build the Where clause of main sql to filter just the rows with root tags and the tag(s) with the lowest_generation greather than 0 (root) ...

    WITH
      grid AS
        ( Select     t.id, t.name, 
                     Max(h.generations) as lowest_generation,
                     Case When Max(h.generations) > 0 Then Min(h.ancestor_id) Else t.id End as root_ancestor
          From       tags t
          Inner Join tag_hierarchies h ON( h.descendant_id = t.id )
          Where      t.name IN('b', 'c', 'e')
          Group By   t.id, t.name
        ) 
    
    --    M a i n    S Q L :
    Select     g.id, g.name
    From       grid   g  
    Where      g.lowest_generation IN(Select Min(lowest_generation) 
                                      From grid 
                                      Where root_ancestor = g.root_ancestor
                                      Group By root_ancestor)
    Order By   g.id
    
    --  R e s u l t : 
    --  (for grid Where clause)  Where      t.name IN('b', 'c', 'e')
    id  name
    --  ----
    85  b
    88  e
    
    --  R e s u l t : 
    --  for grid with no Where clause
    id  name
    --  ----
    84  a
    88  e
    

    See the fiddle here.
    NOTE:
    The code is the same for both - mysql and postgresql... as well as for almost every dialect (Oracle, SQL Sever, SQLite, MariaDB, ...)