I'd like to know if anyone has a solution for this annoying aspect of correlated subqueries:
What if I want to have a correlated sub query return multiple columns ....and.... filter the subquery based on values in the main table?!
ex. A list of student test scores for a set of days:
create table student
(id, name ); insert into student values
(1 , 'joe' ),
(2 , 'steve' );
create table testScore
(id, studentId, day , score); insert into testScore values
(1 , 1 , 5 , 70 ),
(2 , 1 , 10 , 68 ),
(3 , 1 , 15 , 95 ),
(4 , 1 , 20 , 81 ),
(5 , 2 , 5 , 100 ),
(6 , 2 , 10 , 75 ),
(7 , 2 , 15 , 98 ),
(8 , 2 , 20 , 92 );
create index testScore_studentTopScores on testScore (studentId, score, day);
If I need to get the top test score for each student (and include the corresponding day numbers, and have the testScore_studentTopScores
index be used for speed):
I would love to do this:
select
student.name,
topScore.score as topScore,
topScore.day as topScoreDay
from
student
left join (
select * from
testScore
where
testScore.studentId = student.id
order by
score desc
limit 1
) as topScore
order by
topScore desc;
and have it return:
+-------+----------+-------------+
| name | topScore | topScoreDay |
+-------+----------+-------------+
| steve | 100 | 5 |
| joe | 95 | 15 |
+-------+----------+-------------+
But I can't because the from
clause expects to run and evaluate all of its subqueries without being able to reference any outer/parent tables. Column subqueries and where
clause subqueries however are able to access any tables in the from
clause, but they cannot return multiple columns.
So the only way I can think of solving this is to use the nasty duplication of the subquery for every column I want in my results:
select
student.name,
(
select
score
from
testScore
where
testScore.studentId = student.id
order by
score desc
limit 1
) as topScore,
(select day from (
select
score,
day
from
testScore
where
testScore.studentId = student.id
order by
score desc
limit 1
)) as topScoreDay
from
student
order by
topScore desc;
I've run into this pattern soooo many times over the years and have never found a nice solution. Am I missing something simple? (I mainly work with Sqlite and MySQL, and neither DB seems to have an acceptable way to do this without the nasty duplication for each column. As I was posting this, I saw that MS SQLServer has an outer apply
join operator--something like a left join
but where the clause can reference prior tables in the from clause--I guess that's what I want Sqlite to have?)
ok, I had some inspiration after posting to SO (and examining the excellent answers I received, thanks guys!). I think this solution is easy to understand and is usable in any situation like this. It should also be quite efficient because it only needs to do one lookup for each student.
Basically, the whole problem is that you can't return multiple columns in a column based subquery... Well, I can just jam all the columns I want into one JSON column and then extract them later:
select
topScore.name,
topScore.topScoreJson ->> 'score' as topScore,
topScore.topScoreJson ->> 'day' as topScoreDay
from (
select
student.name,
(
select
json_object(
'score', testScore.score,
'day', testScore.day
)
from
testScore
where
testScore.studentId = student.id
order by
testScore.score desc
limit 1
) as topScoreJson
from
student
) as topScore
order by
topScore desc;