I have a very complex query which uses some subqueries within a CASE statement.
For this question the complete query isn't needed and would just prevent people from getting into the problem quick.
So this post uses pseudocode to work with. If wanted I could post the query but it is a monster and of no use for this question.
What I want are cacheable subqueries within the CASE statement.
SELECT * FROM posts posts
INNER JOIN posts_shared_to shared_to
ON shared_to.post_id = posts.post_id
INNER JOIN channels.channels
ON channels.channel_id = shared_to.channel_id
WHERE posts.parent_id IS NULL
AND MATCH (post.text) AGAINST (:keyword IN BOOLEAN MODE)
AND CASE(
WHEN channel.read_access IS NULL THEN 1
WHEN channel.read_access = 1 THEN
(
SELECT count(*) FROM channel_users
WHERE user_id = XXX AND channel_id = channels.channel_id
)
WHEN shared_to.read_type = 2 THEN
(
/* another subquery with a join */
/* check if user is in friendlist of post_author */
)
ELSE 0
END;
)
GROUP BY post.post_id
ORDER BY post.post_id
DESC LIMIT n,n
As statet above this is just a simplified pseudocode.
MySql EXPLAIN says that all used subqueries within the CASE are DEPENDANT which means ( if I'm correct ) that they need to run everytime and aren't cached.
Any solution that helps speeding up this query is welcome.
EDITED PART: Now the true query looks like this:
SELECT a.id, a.title, a.message AS post_text, a.type, a.date, a.author AS uid,
b.a_name as name, b.avatar,
shared_to.to_circle AS circle_id, shared_to.root_circle,
c.circle_name, c.read_access, c.owner_uid, c.profile,
MATCH(a.title,a.message) AGAINST (:keyword IN BOOLEAN MODE) AS score
FROM posts a
/** get userdetails for post_author **/
JOIN authors b ON b.id = a.author
/** get circles posts was shared to **/
JOIN posts_shared_to shared_to ON shared_to.post_id = a.id AND shared_to.deleted IS NULL
/**
* get circle_details note: at the moment shared_to can contain NULL and 1 too and doesnt need to be a circle_id
* if to_circle IS NULL post was shared public
* if to_circle = 1 post was shared to private circles
* since we use md5 keys as circle ids this can be a string insetad of (int) ... ugly..
*
**/
LEFT JOIN circles c ON c.circle_id = shared_to.to_circle
/*AND c.circle_name IS NOT NULL */
AND ( c.profile IS NULL OR c.profile = 6 OR c.profile = 1 )
AND c.deleted IS NULL
LEFT JOIN (
/** if post is within a channel that requires membership we use this to check if requesting user is member **/
SELECT COUNT(*) users_count, user_id, circle_id FROM circles_users
GROUP BY user_id, circle_id
) counts ON counts.circle_id = shared_to.to_circle
AND counts.user_id = :me
LEFT JOIN (
/** if post is shared private we check if requesting users exists within post authors private circles **/
SELECT count(*) in_circles_count, ci.owner_uid AS circle_owner, cu1.user_id AS user_me
FROM circles ci
INNER JOIN circles_users cu1 ON cu1.circle_id = ci.circle_id
AND cu1.deleted IS NULL
WHERE ci.profile IS NULL AND ci.deleted IS NULL
GROUP BY user_me, circle_owner
) users_in_circles ON users_in_circles.user_me = :me
AND users_in_circles.circle_owner = a.id
/** make sure post is a topic **/
WHERE a.parent_id IS NULL AND a.deleted IS NULL
/** search title and post body **/
AND MATCH (a.title,a.message) AGAINST (:keyword IN BOOLEAN MODE)
AND (
/** own circle **/
c.owner_uid = :me
/** site member read_access ( this query is for members, for guests we use a different query ) **/
OR ( c.read_access = 1 OR c.read_access = "1" )
/** public read_access **/
OR ( shared_to.to_circle IS NULL OR ( c.read_access IS NULL AND c.owner_uid IS NOT NULL ) )
/** channel/circle member read_access**/
OR ( c.read_access = 3 OR c.read_access = "3" AND counts.users_count > 0 )
/** for users within post creators private circles **/
OR (
(
/** use shared_to to determine if post is private **/
shared_to.to_circle = "1" OR shared_to.to_circle = 1
/** use circle settings to determine global privacy **/
OR ( c.owner_uid IS NOT NULL AND c.read_access = 2 OR c.read_access = "2" )
) AND users_in_circles.circle_owner = a.author AND users_in_circles.user_me = :me
)
)
GROUP BY a.id ORDER BY a.id DESC LIMIT n,n
Question: Is this really the better way? If I look at how many rows the derived tables can contain I'm not sure about it.
And maybe someone can help me changing the query like mentioned by @Ollie-Jones :
SELECT stuff, stuff, stuff
FROM (
SELECT post.post_id
FROM your whole query
ORDER BY post_id DESC
LIMIT n,n
) ids
JOIN whatever ON whatever.post_id = ids.post_id
JOIN whatelse ON whatelse
Sry if this sound slazy but I'm not really a mysqlguy and I got headaches for years just from building this query. :D
The best way to eliminate your dependent subquery is to refactor it so it's a virtual table (an independent subquery), then JOIN or LEFT JOIN it to the rest of your tables.
In your CASE, you have
SELECT count(*) FROM channel_users
WHERE user_id = XXX AND channel_id = channels.channel_id
So, the independent-subquery casting of this is
SELECT COUNT(*) users_count,
user_id, channel_id
FROM channel_users
GROUP BY user_id, channel_id
Do you see how that virtual table contains one row for each distinct combination of user_id
and channel_id
values? Each row has the users_count
value you need. You can then JOIN that into the rest of your query, like so. (Notice that INNER JOIN === JOIN in MySQL, so I used JOIN to shorten it a bit.)
SELECT * FROM posts posts
JOIN posts_shared_to shared_to ON shared_to.post_id = posts.post_id
JOIN channels.channels ON channels.channel_id = shared_to.channel_id
LEFT JOIN (
SELECT COUNT(*) users_count,
user_id, channel_id
FROM channel_users
GROUP BY user_id, channel_id
) counts ON counts.channel_id = shared_to.channel_id
AND counts.user_id = channels.user_id
LEFT JOIN ( /* your other refactored subquery */
) friendcounts ON whatever
WHERE posts.parent_id IS NULL
AND channels.user_id = XXX
AND MATCH (post.text) AGAINST (:keyword IN BOOLEAN MODE)
AND ( channel.read_access IS NULL
OR (channel.read_access = 1 AND counts.users_count > 0)
OR (shared_to.read_type = AND friendcount.users_count > 0)
)
GROUP BY post.post_id
ORDER BY post.post_id DESC
LIMIT n,n
The MySQL query planner is generally smart enough to generate an appropriate subset of each independent subquery.
Pro tip: SELECT lots of columns ... ORDER BY something LIMIT n
is generally considered a wasteful antipattern. It kills performance because it sorts a whole bunch of columns of data then discards most of the result.
Pro tip: SELECT *
in JOIN query is also wasteful. You are much better off if you give a list of the columns you actually need in your result set.
So, you can refactor your query again to do
SELECT stuff, stuff, stuff
FROM (
SELECT post.post_id
FROM your whole query
ORDER BY post_id DESC
LIMIT n,n
) ids
JOIN whatever ON whatever.post_id = ids.post_id
JOIN whatelse ON whatelse.
The idea is to only sort the post_id
values, then use the LIMITed subset to pull the rest of the data you need.