sqlmariadb

One JSON object as a result of SQL recursive CTE


I have one single table with Parent/Child relationship. I need a SQL query that outputs one single JSON object (I am running MariaDB 11.4) I am not familiar with recursive CTE and I would like the result set as shown in the below reference image and the sample JSON object based on the sample data in the table.

Database info is dbfiddle.uk

Tree Diagram of the data

Here is the required query output

{
  "id": 1,
  "account_name": "root",
  "is_group": 1,
  "children": [
    {
      "id": 2,
      "account_name": "assets",
      "is_group": 1,
      "children": [
        {
          "id": 3,
          "account_name": "curr_assets",
          "is_group": 1,
          "children": [
            {
              "id": 4,
              "account_name": "cash",
              "is_group": 0,
              "children": []
            },
            {
              "id": 5,
              "account_name": "bank",
              "is_group": 0,
              "children": []
            }
          ]
        },
        {
          "id": 6,
          "account_name": "fixed_assets",
          "is_group": 1,
          "children": [
            {
              "id": 7,
              "account_name": "Buildings",
              "is_group": 1,
              "children": [
                {
                  "id": 8,
                  "account_name": "shopfloor",
                  "is_group": 0,
                  "children": []
                },
                {
                  "id": 9,
                  "account_name": "offices",
                  "is_group": 0,
                  "children": []
                }
              ]
            }
          ]
        },
        {
          "id": 10,
          "account_name": "stocks",
          "is_group": 0,
          "children": []
        },
        {
          "id": 11,
          "account_name": "furniture",
          "is_group": 1,
          "children": []
        }
      ]
    },
    {
      "id": 12,
      "account_name": "liability",
      "is_group": 1,
      "children": [
        {
          "id": 13,
          "account_name": "payable",
          "is_group": 1,
          "children": []
        }
      ]
    }
  ]
}

Solution

  • It is possible to reach your goal with some points of attention:

    1. It is possible to use aggregate functions in a recursive CTE… with a bit of hack.
      1.1. Specifically, you can use them in a subquery, but only once;
      if putting it in a CTE (recurs as (… union all select … from (with subrecurs as (select … from recurs group by …) select … from subrecurs))), you can only use it once, you will not be able to join twice to subrecurs
      So in essence you have to prepare your data well so that you only use it once on each iteration.
    2. You have to aggregate down to top
    3. 1 iteration = 1 depth
      you cannot use a leafs-first algorithm, because of the nature of recursive CTEs: the query for an iteration only sees rows for the previous iteration, thus when handling assets (depth 1), the previous iteration should have emitted both fixed_assets (a branch at depth 2 but having 2 other levels under) and furniture (at depth 2 too but a leaf): so you cannot start with "every leaf at the bottom of the tree", you have to know when each leaf can join the band
      so you'll need to first compute the depth of each node by going top to down then down to top
    4. Due to 1.1., you cannot have your leafs introduced where leaf.depth = (select max(depth) from previousiteration), because you need previousiteration for your aggregated JSON too (something like candidatesforthisiteration left join previousiteration), and we said you can't refer to previousiteration twice.
      So each leaf should have at least one fictive node of the previous iteration he's the parent of so we can join instead of left join.
      We can't use "any row of the previous iteration, joined just on depth" because then we would have too many joined rows, depending on the number of "real" node rows emitted at previousiteration
    5. … But then those fictive nodes themselves are leaves (they have one parent, which is the real leaf, but no child). We'll need a fictive leaf to introduce them to the tree too. So we'll add another catch-all fictive node at each depth, that, contrary to the other fictive nodes who target a real node with an ID, can be child of "all nodes without an ID"

    Finally, for the JSON part, as MariaDB's JSON really are big strings with the risk of getting as many \ as we have iterations, we'll avoid using JSON aggregate functions, and stay with simple string ones.

    with recursive
    topdown as
    (
      select 0 depth, id, parent from coa where parent is null
      union all
      # We do not just join, we left join, so that each leaf gets a virtual child (with a null id, but with a parent)
      select depth + 1, c.id, p.id from topdown p left join coa c on c.parent = p.id
      where p.id is not null
    ),
    entrypoints as
    (
      select * from topdown
      union all
      # Make sure we have a totally virtual node at each depth (with just a depth, but no parent and no id)
      select depth + 1, null, null from topdown group by depth
    ),
    # Flat JSON contents (do it here instead of in the recursive downtop query, which will be dense).
    contents as
    (
      select depth, coa.id, coa.parent, json_object('id', t.id, 'account_name', account_name, 'is_group', is_group) j
      from topdown t join coa on coa.id = t.id 
    ),
    downtop as
    (
      select depth, id, parent, cast('' as varchar(1048576)) j from entrypoints where depth = (select max(depth) from entrypoints)
      union all
      # Do not use json_insert, which considers our previous iteration produced a string and will quote it entirely.
      select p.depth, p.id, p.parent, concat(substr(pc.j, 1, length(pc.j) - 1), ',"children":[',coalesce(c.j, ''),']}')
      from
      (
          select depth, parent, group_concat(j) j
          from downtop
          group by depth, parent
      ) c
      # Join entry points, either if we're directly referenced as a parent, or if we're a virtual node crafted to be introduced at this depth.
      join entrypoints p on p.depth = c.depth - 1 and (p.id = c.parent or (c.parent is null and p.id is null))
      left join contents pc on pc.id = p.id
    )
    select * from downtop where parent is null and id is not null;
    

    I made it run in a fiddle.