mysqlsqlaggregate-functionsrow-value-expression

SQL: tuple comparison


In my current application, I need to be able to do this type of query:

SELECT MIN((colA, colB, colC)) 
FROM mytable
WHERE (colA, colB, colC) BETWEEN (200, 'B', 'C') AND (1000, 'E', 'F')

and get the answer of (333, 'B', 'B'), given this data:

+------+------+------+
| colA | colB | colC |
+------+------+------+
|   99 | A    | A    |
|  200 | A    | Z    |
|  200 | B    | B    |
|  333 | B    | B    |
|  333 | C    | D    |
|  333 | C    | E    |
|  333 | D    | C    |
| 1000 | E    | G    |
| 1000 | F    | A    |
+------+------+------+

What is the most efficient way to accomplish this in real SQL? Please keep in mind that this is a toy example, and that my actual application has tables with varying columns and data types, and hundreds of million of rows. I use MySQL, if that helps. You can also assume that these columns have a PRIMARY or UNIQUE index on them.

If the solution is easily extensible to more/less columns, that's even better.


Tuple Comparison:

Several have asked so I should put this in the question. Tuples are ordered lexicographically, meaning that the sequences are ordered the same as their first differing elements. For example, (1,2,x) < (1,2,y) returns the same as x < y.

It's worth noting that SQL (or at least mysql) implements this correctly:

mysql> select (200, 'B', 'C') < (333, 'B', 'B') and (333, 'B', 'B') < (1000, 'E', 'F');
+--------------------------------------------------------------------------+
| (200, 'B', 'C') < (333, 'B', 'B') and (333, 'B', 'B') < (1000, 'E', 'F') |
+--------------------------------------------------------------------------+
|                                                                        1 |
+--------------------------------------------------------------------------+
1 row in set (0.00 sec)

Here's the necessary SQL to create the example:

create table mytable select 333 colA, 'B' colB, 'B' colC;
insert into mytable values (200, 'B', 'B'), (333, 'C', 'D'), (1000, 'E', 'G'), 
    (200, 'A', 'Z'), (1000, 'F', 'A'), (333, 'C', 'E'), (333, 'D', 'C'),
    (99, 'A', 'A');
alter table mytable add unique index myindex (colA, colB, colC);

Adding this index seems to cause the table to be sorted lexicographically, which is interesting. This isn't true in our production system.


Solution

  • Just do:

    SELECT colA
         , colB
         , colC
    FROM mytable
    WHERE ( ('A',  'B',  'C') <= (colA, colB, colC ) )
      AND ( (colA, colB, colC) <= ('D',  'E',  'F' ) )
    ORDER BY colA, colB, colC
    LIMIT 1
    ;
    

    It works just fine. And I suspect is should be pretty fast, too.


    This is equivalent but it may have better performance, depending on your tables:

    SELECT m.colA
         , m.colB
         , m.colC
    FROM mytable m
    WHERE ( ('A',  'B',  'C') <= (m.colA, m.colB, m.colC) )
      AND ( (m.colA, m.colB, m.colC) <= ('D',  'E',  'F') )
      AND NOT EXISTS
      ( SELECT 1
        FROM mytable b
        WHERE (b.colA, b.colB, b.colC) < (m. colA, m.colB, m.colC)
          AND ( ('A',  'B',  'C') <= (b.colA, b.colB, b.colC) )
      );