sqlitecorrelated-subquery

Sqlite Multi-column Correlated Subquery: Any better solutions?


I'd like to know if anyone has a solution for this annoying aspect of correlated subqueries:

What if I want to have a correlated sub query return multiple columns ....and.... filter the subquery based on values in the main table?!

ex. A list of student test scores for a set of days:

create table student  
(id, name     ); insert into student values 
(1 , 'joe'    ),
(2 , 'steve'  );

create table testScore
(id, studentId, day  , score); insert into testScore values   
(1 , 1        , 5    , 70   ),
(2 , 1        , 10   , 68   ),
(3 , 1        , 15   , 95   ),
(4 , 1        , 20   , 81   ),
(5 , 2        , 5    , 100  ),
(6 , 2        , 10   , 75   ),
(7 , 2        , 15   , 98   ),
(8 , 2        , 20   , 92   );
create index testScore_studentTopScores on testScore (studentId, score, day); 

If I need to get the top test score for each student (and include the corresponding day numbers, and have the testScore_studentTopScores index be used for speed):

I would love to do this:

select
  student.name,
  topScore.score as topScore,
  topScore.day as topScoreDay
from
  student
  left join (
    select * from 
      testScore 
    where 
      testScore.studentId = student.id
    order by 
      score desc
    limit 1  
  ) as topScore  
order by
  topScore desc;

and have it return:

+-------+----------+-------------+
| name  | topScore | topScoreDay |
+-------+----------+-------------+
| steve | 100      | 5           |
| joe   | 95       | 15          |
+-------+----------+-------------+

But I can't because the from clause expects to run and evaluate all of its subqueries without being able to reference any outer/parent tables. Column subqueries and where clause subqueries however are able to access any tables in the from clause, but they cannot return multiple columns.

So the only way I can think of solving this is to use the nasty duplication of the subquery for every column I want in my results:

select
  student.name,
  
  (
    select 
      score 
    from 
      testScore 
    where 
      testScore.studentId = student.id
    order by 
      score desc
    limit 1  
  )  as topScore,
  
  (select day from (
    select 
      score,
      day
    from 
      testScore 
    where 
      testScore.studentId = student.id
    order by 
      score desc
    limit 1  
  ))  as topScoreDay
  
from
  student
order by
  topScore desc;

I've run into this pattern soooo many times over the years and have never found a nice solution. Am I missing something simple? (I mainly work with Sqlite and MySQL, and neither DB seems to have an acceptable way to do this without the nasty duplication for each column. As I was posting this, I saw that MS SQLServer has an outer apply join operator--something like a left join but where the clause can reference prior tables in the from clause--I guess that's what I want Sqlite to have?)


Solution

  • ok, I had some inspiration after posting to SO (and examining the excellent answers I received, thanks guys!). I think this solution is easy to understand and is usable in any situation like this. It should also be quite efficient because it only needs to do one lookup for each student.

    Basically, the whole problem is that you can't return multiple columns in a column based subquery... Well, I can just jam all the columns I want into one JSON column and then extract them later:

    select 
      topScore.name,
      topScore.topScoreJson ->> 'score' as topScore,
      topScore.topScoreJson ->> 'day' as topScoreDay
    from (  
      select
        student.name,
        (
          select
            json_object(
              'score', testScore.score,
              'day', testScore.day
            )  
          from 
            testScore 
          where 
            testScore.studentId = student.id
          order by 
            testScore.score desc
          limit 1  
        ) as topScoreJson
      from
        student
    ) as topScore   
    order by
      topScore desc;