sqlmysqlsql-serveroraclejoin

Joining tables based on the maximum value


Here's a simplified example of what I'm talking about:

Table: students      exam_results
_____________       ____________________________________
| id | name |       | id | student_id | score |   date |
|----+------|       |----+------------+-------+--------|
|  1 | Jim  |       |  1 |          1 |    73 | 8/1/09 | 
|  2 | Joe  |       |  2 |          1 |    67 | 9/2/09 |
|  3 | Jay  |       |  3 |          1 |    93 | 1/3/09 |
|____|______|       |  4 |          2 |    27 | 4/9/09 |
                    |  5 |          2 |    17 | 8/9/09 |
                    |  6 |          3 |   100 | 1/6/09 |
                    |____|____________|_______|________|

Assume, for the sake of this question, that every student has at least one exam result recorded.

How would you select each student along with their highest score? Edit: ...AND the other fields in that record?

Expected output:

_________________________
| name | score |   date |
|------+-------|--------|
|  Jim |    93 | 1/3/09 |
|  Joe |    27 | 4/9/09 |
|  Jay |   100 | 1/6/09 |
|______|_______|________|

Answers using all types of DBMS are welcome.


Solution

  • Answering the EDITED question (i.e. to get associated columns as well).

    In Sql Server 2005+, the best approach would be to use a ranking/window function in conjunction with a CTE, like this:

    with exam_data as
    (
        select  r.student_id, r.score, r.date,
                row_number() over(partition by r.student_id order by r.score desc) as rn
        from    exam_results r
    )
    select  s.name, d.score, d.date, d.student_id
    from    students s
    join    exam_data d
    on      s.id = d.student_id
    where   d.rn = 1;
    

    For an ANSI-SQL compliant solution, a subquery and self-join will work, like this:

    select  s.name, r.student_id, r.score, r.date
    from    (
                select  r.student_id, max(r.score) as max_score
                from    exam_results r
                group by r.student_id
            ) d
    join    exam_results r
    on      r.student_id = d.student_id
    and     r.score = d.max_score
    join    students s
    on      s.id = r.student_id;
    

    This last one assumes there aren't duplicate student_id/max_score combinations, if there are and/or you want to plan to de-duplicate them, you'll need to use another subquery to join to with something deterministic to decide which record to pull. For example, assuming you can't have multiple records for a given student with the same date, if you wanted to break a tie based on the most recent max_score, you'd do something like the following:

    select  s.name, r3.student_id, r3.score, r3.date, r3.other_column_a, ...
    from    (
                select  r2.student_id, r2.score as max_score, max(r2.date) as max_score_max_date
                from    (
                            select  r1.student_id, max(r1.score) as max_score
                            from    exam_results r1
                            group by r1.student_id
                        ) d
                join    exam_results r2
                on      r2.student_id = d.student_id
                and     r2.score = d.max_score
                group by r2.student_id, r2.score
            ) r
    join    exam_results r3
    on      r3.student_id = r.student_id
    and     r3.score = r.max_score
    and     r3.date = r.max_score_max_date
    join    students s
    on      s.id = r3.student_id;
    

    EDIT: Added proper de-duplicating query thanks to Mark's good catch in comments