I am trying to populate ElasticSearch with a collection of documents along with a field representing the path to the document based on its parents.
Here is my table layout:
+----+--------+-------+----------+
| Id | Parent | Alias | Contents |
+----+--------+-------+----------+
| 1 | null | Doc1 | Admin |
| 2 | 1 | Doc2 | Use |
| 3 | 2 | Doc3 | Test |
| 4 | 3 | Doc4 | Ask |
| 5 | null | PDF1 | Intro |
| 6 | 5 | PDF2 | Managers |
+----+--------+-------+----------+
Here is the desired output
+----+--------+-------+----------+---------------------+
| Id | Parent | Alias | Contents | Path |
+----+--------+-------+----------+---------------------+
| 1 | null | Doc1 | Admin | Doc1 |
| 2 | 1 | Doc2 | Use | Doc1\Doc2 |
| 3 | 2 | Doc3 | Test | Doc1\Doc2\Doc3 |
| 4 | 3 | Doc4 | Ask | Doc1\Doc2\Doc3\Doc4 |
| 5 | null | PDF1 | Intro | PDF1 |
| 6 | 5 | PDF2 | Managers | PDF1\PDF2 |
+----+--------+-------+----------+---------------------+
I have this query that gets the Path of one document specified by the parameter @child; (aka SET @child = 5;
)
SELECT
T2.*
FROM
(SELECT
@r AS _id,
(SELECT
@r:=Parent
FROM
documents
WHERE
id = _id) AS ParentId,
@l:=@l + 1 AS lvl
FROM
(SELECT @r:=@child, @l:=@parent) vars, documents
WHERE
@r <> 0) T1
JOIN
documents T2 ON T1._id = T2.Id
ORDER BY T2.Parent
The problem being is how do I set @child if I put this into a subquery? I have tried GROUP_CONCAT() but it always ends up being the same path for every line. I have tried putting the Id of the current row in subquery but it throws an error: ErrorCode: 1109. Unknown table 'doc' in field list
in the following query
SELECT doc.*, (
SELECT GROUP_CONCAT(a.Alias) FROM (SELECT
T2.*
FROM
(SELECT
@r AS _id,
(SELECT
@r:=Parent
FROM
documents
WHERE
id = _id) AS ParentId,
@l:=@l + 1 AS lvl
FROM
(SELECT @r:= doc.Id, @l:=@parent) vars, documents
WHERE
@r <> 0) T1
JOIN
documents T2 ON T1._id = T2.Id
ORDER BY T1.lvl DESC) a
) as Path FROM documents doc
What am I doing wrong? Is there a better way to do this that I'm not seeing?
Though it is not entirely relevant, I will point out, I'm using a logstash script to load the documents into ElasticSearch from my database on a schedule. Also for multiplicities sake I have taken out the majority of the columns as well as the contents and replaced with faux contents.
You get your error because you cannot use an outer variable in a derived table. A derived table is basically every "subquery" for which you have to use an alias, like vars
in your case. Try removing that alias, and MySQL will tell you that every derived table has to have an alias.
One way to solve this is to move your whole query into a function, e.g. getpath(child_id int)
, where you can then freely use this variable whereever you want (assuming you have a working query that can get the path for one specific child, "something with GROUP_CONCAT()
").
But in your case, it is actually possible to reorganize your code so you do not need a derived table:
select d.*, t3.path
from (
SELECT t1.id,
group_concat(t2.alias order by t1.rownum desc separator '\\' ) as path
from (
SELECT
current_child.id,
lvls.rownum,
@r := if(lvls.rownum = 1, current_child.id, @r) AS _id,
(SELECT @r:=Parent
FROM documents
WHERE id = _id) AS ParentId
FROM (select @rownum:= @rownum+1 as rownum
from documents, -- maybe add limit 5
(select @rownum := 0) vars
) as lvls
-- or use:
-- (select 1 as rownum union select 2 union select 3
-- union select 4 union select 5) as lvls
straight_join documents as current_child
) as t1
join documents t2
on t2.id = t1._id
group by t1.id
) t3
join documents d
on d.id = t3.id;
I used your inner documents the same way as you did, which is actually quite inefficient and is only used to support an unlimited tree depth. If you know your max dependency level, you could use the alternative code for lvls
I added as a comment (which is just a list of numbers) or the limit
.
Make sure to set the group_concat_max_len
-setting to an appropriate value (with e.g. set session group_concat_max_len = 20000;
). By default, it supports a length of 1024, which will usually be enough, but for long aliases or really deep trees you might reach it - and since it will give you neither an error nor a warning, it is sometimes hard to diagnose, so be aware of it.
There is a more straight forward way to solve your problem. It requires you to know the maximum depth of your tree though, but if you do, you can simply join your parents to every child.
select child.*,
concat_ws('\\',p4.Alias,p3.Alias,p2.Alias,p1.Alias,child.Alias) as path
from documents child
left join documents p1 on child.parent = p1.id
left join documents p2 on p1.parent = p2.id
left join documents p3 on p2.parent = p3.id
left join documents p4 on p3.parent = p4.id;
Generally speaking, the tree you used for your hierarchy does not work very well in sql because of the recursive nature of the model (even if other databases actually support recursive queries in a very similar way you simulated with the variables).
For other ways to model your hierarchy, see e.g. Bill Karwins presentation Models for hierarchical data. They make it a lot easier to query a path without recursion.