postgresqlcommon-table-expressionrecursive-query

Traversing graph using recursive CTE - using "array" to store visited vertices


I'm studying graph traversal using recursive CTEs (learnSQL course). The question asks about visiting every city from a starting city. They use a "path" string built up from appended city names and then using where position(city.name IN travel.path) = 0 to see if the city has already been visited. That works but I often look for alternatives and I could use an array of visited cities.

As an experienced dev (e.g. Java but any imperative language) I'd represent visited as a SET for efficient containment testing, but postgresql only seems to offer ARRAY.

  1. Is there a SET type (something simple, not a separate table or JSON set) I could use?
  2. Is the ARRAY efficient in terms of lookup?

The example data is, of course, tiny, and isn't a problem but I'd like to know what's most efficient/best practise. String lookup using position isn't terribly semantic IMO.

Thanks!

with recursive
travel(path, visited, visited_count) AS (
  select
    name::text,
    array[name::text],
    1
  
  from city
  where
    name = 'Bristol'
  
  union all
  
  select
    travel.path || '->' || city.name,
    visited || name::text,
    visited_count + 1
  
  from city, travel
  
  where
    city.name <> all(visited)
    --position(city.name IN travel.path) = 0
)
  
select
  path,
  visited

from travel

where
  visited_count = 6


Solution

  • Use the ARRAY approach for semantic clarity, as both array checks (<> ALL) and string searches (position()) have similar O(N) lookup performance within the recursive step.

    WITH RECURSIVE travel(path, visited, visited_count) AS (
      -- base case
      SELECT name::text, ARRAY[name::text], 1
      FROM city WHERE name = 'Bristol'
    
      UNION ALL
    
      -- recursive step
      SELECT
        t.path || '->' || c.name, -- build path string
        t.visited || c.name::text, -- append to visited array (o(n) copy + append)
        t.visited_count + 1
      FROM city c, travel t
      WHERE c.name <> ALL(t.visited) -- check if visited (o(n) array scan)
    )
    SELECT path, visited
    FROM travel
    WHERE visited_count = (SELECT count(*) FROM city); -- condition to stop (e.g., all cities visited)