sqlmysqlrecursioncommon-table-expression

How to use a recursive CTE in a JOIN statement


I'm working on converting an old database and queries to recursive CTEs. In the current code, I'm storing hierarchical data as a dash separated list of 0 padded strings. So as an example, I may have the following forums, with their respective heritage fields

- forum1 (0001)
-- forum4 (0001-0004)
--- forum5 (0001-0004-0005)

The tables also contain a parentID, which I'm trying to build the recursive query off of. For example, I have this query, which grabs all the forums a user is subscribed to, and it's parents:

SELECT
    p.forumID,
    p.title,
    p.parentID,
    p.order,
    IF(s.ID = p.forumID, 1, 0) isSubbed
FROM
    forumSubs s
INNER JOIN forums f ON s.ID = f.forumID
INNER JOIN forums p ON f.heritage LIKE CONCAT(p.heritage, '%')
WHERE
    p.forumID != 0
    AND s.userID = {$userID}
    AND s.`type` = 'f'
ORDER BY
    LENGTH(p.heritage),
    `order`

I created a CTE to get a forum and its parents:

with recursive forum_with_parents (forumID, title, parentID, `order`) as (
  select forumID, title, parentID, `order`
  from       forums
  where      forumID = ?
  union all
  select p.forumID, p.title, p.parentID, p.`order`
  from       forums p
  inner join forum_with_parents
          on p.forumID = forum_with_parents.parentID
)
select * from forum_with_parents;

But it needs a forumID to work. So how could I join against it? I'd figure I'd be replacing the forums p with forums_with_parents, but I don't know how to join against it, because I need that info before I can set the value in the CTE itself. Does the ENTIRE thing have to be a CTE? If so, I'm struggling to think how to do that.


Solution

  • General possibilities

    Your recursive initialization query is standard SQL, so it can absolutely use a JOIN, either with a "real" table, a sub-SELECT table (from forums join (select …) subsel on subsel.forumID = forums.forumID), or a CTE, which would be my preferred (clearest) way of doing a multi-stage query with some recursive parts (you can even have two recursive CTEs in the same with block. But only the latest of the two will be joinable to the results of the first one; you cannot have two recursive queries accessing each other's temporary result set while running, the first one has to be finished before the second one can launch).

    You can mix non-recursive and recursive CTEs in the same with recursive query;
    the recursive keyword then has to be mentioned only once at the start, not before each recursive CTE; the RDBMS will infer which parts run recursively, and which are simple CTEs.
    Thus you can write:

    with recursive
    subsel as (/* not a recursive CTE */),
    forum_with_parents as
    (
        /* Iteration #0: */
        select …
        from forums join subsel on subsel.forumID = forums.forumID
            union all
        /* Recursive iterations: */
        select
        from forums p inner join forum_with_parents …
    )
    …;
    

    Applying to your problem

    If I understand correctly, you are looking to use a recursive CTE to rebuild the hierarchy of forums, of which one user has subscribed to some:
    have the same results as your first (non-recursive) query, that relies on a denormalized heritage column that holds the results of a recursive walkthrough of the tree of forums,
    using only recursive CTEs (and dropping your heritage column that poses maintenance problems).

    We don't even have to use a second CTE (as I told for the general case),
    you can simply replace your = ? by an in (select s.ID from forumSubs s where s.userID = {$userID} and s.`type` = 'f').

    I would suggest an additional modification: use union, not union all:\0 This will automatically handle deduplication, in case the user subscribed to two forums of the same hierarchy, only one of them will appear in the final result
    (if he subscribed to both 4 and 5 for example, both will be detected as direct subscriptions in iteration #0, and then on iteration #1: 4 will fetch its parent 1, and 5 will fetch 4, but as 4 has already been fetched on iteration #0 the recursivity stops propagating it and only continues with 1)

    Thus your query would be (with in (select …) + union without all):

    with recursive
    forum_with_parents (forumID, title, parentID, `order`) as (
      select forumID, title, parentID, `order`
      from       forums
      where      forumID in (select s.ID from forumSubs s where s.userID = {$userID} and s.\`type\` = 'f')
      union
      select p.forumID, p.title, p.parentID, p.`order`
      from       forums p
      inner join forum_with_parents
              on p.forumID = forum_with_parents.parentID
    )
    select * from forum_with_parents;
    

    You can see it (adapted to multiple users) running in a fiddle, with a user 1 having subscribed to both 1 and 5. You can see: