I am totally new to MySQL and appreciate very much for your patience. I have the following MySQL table:
fact_id | from_id | relation_id | to_id | link_time |
---|---|---|---|---|
1 | 19 | 6 | 151 | 1 |
2 | 2233 | 2 | 57 | 1 |
3 | 182 | 23 | 112 | 1 |
4 | 22 | 17 | 21 | 1 |
5 | 3 | 8 | 742 | 1 |
6 | 507 | 2 | 55 | 1 |
7 | 154 | 25 | 56 | 1 |
8 | 100 | 83 | 18 | 1 |
9 | 1110 | 2 | 31 | 1 |
10 | 141 | 29 | 7 | 1 |
... | ... | ... | ... | ... |
with the corresponding init code:
create table icews14s
(
fact_id int auto_increment
primary key,
from_id int not null,
relation_id int not null,
to_id int not null,
link_time int not null
);
create index icews14s_from_id_index
on icews14s (from_id);
create index icews14s_link_time_index
on icews14s (link_time);
create index icews14s_to_id_index
on icews14s (to_id);
and one long query:
with target as (select fact_id, from_id, link_time from icews14s where fact_id = 69298),
pre_nodes_1 as (select o.fact_id as fact_id,
o.from_id as from_id,
o.relation_id as relation_id,
o.to_id as to_id,
o.link_time as link_time,
1 as degree
from icews14s o
left join (select * from icews14s) t
on o.from_id = t.from_id and o.link_time < t.link_time
where t.fact_id in (select fact_id from target)),
pre_nodes_2 as (select o.fact_id as fact_id,
o.from_id as from_id,
o.relation_id as relation_id,
o.to_id as to_id,
o.link_time as link_time,
2 as degree
from icews14s o
left join (select * from icews14s) t on o.from_id = t.to_id and o.link_time < t.link_time
where t.fact_id in (select fact_id from pre_nodes_1)),
pre_nodes_3 as (select o.fact_id as fact_id,
o.from_id as from_id,
o.relation_id as relation_id,
o.to_id as to_id,
o.link_time as link_time,
3 as degree
from icews14s o
left join (select * from icews14s) t on o.from_id = t.to_id and o.link_time < t.link_time
where t.fact_id in (select fact_id from pre_nodes_2))
select fact_id, from_id, relation_id, to_id, link_time, 0 as degree from icews14s where fact_id = 69298
union select * from pre_nodes_1
union select * from pre_nodes_2
union select * from pre_nodes_3
order by fact_id desc limit 30;
the corresponding result is :
fact_id | from_id | relation_id | to_id | link_time | degree |
---|---|---|---|---|---|
69298 | 1659 | 16 | 269 | 285 | 0 |
60977 | 1659 | 37 | 3176 | 253 | 1 |
58981 | 3176 | 1 | 3281 | 245 | 2 |
58757 | 1659 | 8 | 1884 | 245 | 1 |
58722 | 1659 | 0 | 1884 | 245 | 1 |
39282 | 1659 | 1 | 105 | 163 | 1 |
38740 | 105 | 7 | 1143 | 161 | 2 |
38570 | 105 | 29 | 1815 | 161 | 2 |
38440 | 105 | 19 | 2 | 160 | 2 |
38101 | 2 | 28 | 52 | 159 | 3 |
38061 | 105 | 0 | 581 | 158 | 2 |
37825 | 2 | 14 | 1057 | 157 | 3 |
37822 | 2 | 2 | 228 | 157 | 3 |
37606 | 2 | 2 | 1006 | 156 | 3 |
37597 | 2 | 9 | 9 | 156 | 3 |
37554 | 2 | 2 | 9 | 156 | 3 |
37390 | 2 | 99 | 9 | 156 | 3 |
37322 | 2 | 8 | 48 | 155 | 3 |
37277 | 2 | 2 | 9 | 155 | 3 |
37266 | 2 | 9 | 1068 | 155 | 3 |
37210 | 2 | 8 | 239 | 155 | 3 |
37120 | 2 | 2 | 90 | 155 | 3 |
37032 | 2 | 2 | 9 | 154 | 3 |
36993 | 2 | 8 | 28 | 154 | 3 |
36988 | 2 | 71 | 136 | 154 | 3 |
36971 | 2 | 2 | 90 | 154 | 3 |
36949 | 2 | 9 | 48 | 154 | 3 |
36896 | 2 | 29 | 9 | 154 | 3 |
36827 | 2 | 28 | 309 | 154 | 3 |
36798 | 2 | 10 | 52 | 153 | 3 |
The question is, this query is very slow in such a 100000 record table.
Is there any methods to make it quicker?
I tried to solve using recrusive temp table, but this statement never ends:
WITH RECURSIVE pre_nodes AS (
SELECT
fact_id,
from_id,
relation_id,
to_id,
link_time,
0 AS degree
FROM
icews14s
WHERE
fact_id = 69298
UNION
SELECT
o.fact_id,
o.from_id,
o.relation_id,
o.to_id,
o.link_time,
n.degree + 1
FROM
icews14s o
JOIN
pre_nodes n ON IF(n.degree = 0, (o.from_id = n.from_id AND o.link_time < n.link_time),
(o.from_id = n.to_id AND o.link_time < n.link_time))
WHERE
o.fact_id != 69298
)
SELECT distinct
fact_id,
from_id,
relation_id,
to_id,
link_time,
degree
FROM
pre_nodes
ORDER BY
fact_id DESC
LIMIT
30;
Your first query can be simplified quite a bit. Let's take the pre_nodes_1
cte as an example:
pre_nodes_1 as (
select o.fact_id as fact_id,
o.from_id as from_id,
o.relation_id as relation_id,
o.to_id as to_id,
o.link_time as link_time,
1 as degree
from icews14s o
left join (select * from icews14s) t
on o.from_id = t.from_id
and o.link_time < t.link_time
where t.fact_id in (select fact_id from target)
)
As O. Jones pointed out in the comments, the nesting of (select * from icews14s)
is unnecessary, but the optimizer should spot this and remove the nesting. The left join
will be implicitly turned into an inner join
due to the criterion on t.fact_id
. Given the t.fact_id in (select fact_id from target)
, I think the cte can be rewritten as:
pre_nodes_1 as (
select o.*, 1 as degree
from icews14s o
join target t
on o.from_id = t.from_id and o.link_time < t.link_time
)
The same pattern applies to the other two pre_nodes_*
ctes. Given the o.link_time < t.link_time
criterion in each of the joins, it should not be necessary to dedupe the UNION
, so switching to UNION ALL
will reduce overhead by removing the dedupe step.
with target as (
select fact_id, from_id, link_time from icews14s where fact_id = 69298
),
pre_nodes_1 as (
select o.*, 1 as degree
from icews14s o
join target t
on o.from_id = t.from_id and o.link_time < t.link_time
),
pre_nodes_2 as (
select o.*, 2 as degree
from icews14s o
join pre_nodes_1 t
on o.from_id = t.to_id and o.link_time < t.link_time
),
pre_nodes_3 as (
select o.*, 3 as degree
from icews14s o
join pre_nodes_2 t
on o.from_id = t.to_id and o.link_time < t.link_time
)
select fact_id, from_id, relation_id, to_id, link_time, 0 as degree
from icews14s
where fact_id = 69298
union all
select * from pre_nodes_1
union all
select * from pre_nodes_2
union all
select * from pre_nodes_3
order by fact_id desc
limit 30;
This query will probably benefit from a composite index on (from_id, link_time)
:
alter table icews14s
add index idx_from_id_link_time (from_id, link_time);
For your recursive cte, I suspect the performance issue is due to the join condition. Without a reasonable amount of test data I cannot test this, but I suspect rewriting the recursive part of the cte like this may allow it to use an index:
SELECT o.*, n.degree + 1
FROM pre_nodes n
JOIN icews14s o
ON o.from_id = IF(n.degree = 0, n.from_id, n.to_id)
AND o.link_time < n.link_time
AND o.fact_id <> 69298