mysqloptimizationquery-optimizationsqlperformance

MySQL optimization/performance, how to use limit efficiently | location based selection


The following shortened query selects all rows (entrys) inside a given distance (calculated from user e.altloc = 0: location or e.altloc = 1: altlocation ).

I have indexes on e.uid, al.eid, e.country, e.tmstmp and the id's are primary keys.

The problem according to explain all rows are need to process the query, instead of 2 rows which I like with limit 2.

I have read this question, but I'm not able to do the Limit before using a join, because I need to join the location tables before I can do the limit 2 else the return would be wrong. https://dba.stackexchange.com/questions/52079/does-using-limit-improve-the-performance-and-is-it-noticeable

The query:

SELECT 
        e.id, e.uid, e.title, e.description, l.place, l.placenonce, al.altplace, al.altplacenonce,
        IF(e.altloc=0,
            6371 * acos( cos( radians(:lat) ) * cos( radians( AES_DECRYPT(lat, UNHEX('###'), latnonce) ) ) * cos( radians( AES_DECRYPT(lng, UNHEX('###'), lngnonce) ) - radians(:lng) ) + sin( radians(:lat) ) * sin(radians(AES_DECRYPT(lat, UNHEX('###'), latnonce))) ) ,
            6371 * acos( cos( radians(:lat) ) * cos( radians( AES_DECRYPT(altlat, UNHEX('###'), altlatnonce) ) ) * cos( radians( AES_DECRYPT(altlng, UNHEX('###'), altlngnonce) ) - radians(:lng) ) + sin( radians(:lat) ) * sin(radians(AES_DECRYPT(altlat, UNHEX('###'), altlatnonce))) )
        ) AS distance
    FROM 
        entrys e 
    INNER JOIN 
        location l 
        ON l.id = e.uid 
    LEFT JOIN
        altlocation al
        ON al.eid = e.id
    WHERE 
        IF(:border = 0, e.country = :countryid, e.country != 0 )    
    HAVING 
        distance <= 50
    ORDER BY 
        e.tmstmp 
    DESC
    LIMIT 2

A second example with a fixed location:

SELECT 
    s.id, s.image, s.description, s.title,      
    ( 
        6371 * acos( cos( radians(:lat) ) * cos( radians( AES_DECRYPT(l.lat, :key, l.latnonce) ) ) * cos( radians( AES_DECRYPT(l.lng, :key, l.lngnonce) ) - radians(:lng) ) + sin( radians(:lat) ) * sin(radians(AES_DECRYPT(l.lat, :key, l.latnonce))) ) 
    ) AS distance
FROM 
    sponsors s 
INNER JOIN 
    location l 
    ON l.id = s.id 
WHERE 
    s.comp = 1 OR s.comp = 3 AND s.active = 1
HAVING 
    distance <= 50
ORDER BY
    s.rotate
ASC
LIMIT 2

How to improve this location based querys, if there are million of rows in my database? I need to output only 2 rows of each query.

Create table for the first example:

  CREATE TABLE `entrys` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `uid` int(5) NOT NULL,
 `tmstmp` bigint(11) NOT NULL,
 `approx_lat` mediumint(9) NOT NULL,
 `approx_lng` mediumint(9) NOT NULL,
 `altloc` tinyint(4) NOT NULL,
 `title` varchar(70) COLLATE latin1_general_ci NOT NULL,
 `description` text COLLATE latin1_general_ci NOT NULL,
 `country` tinyint(4) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `uid` (`uid`),
 KEY `tmstmp` (`tmstmp`),
 KEY `country` (`country`),
) ENGINE=MyISAM CHARSET=latin1 COLLATE=latin1_general_ci

CREATE TABLE `location` (
 `id` int(5) NOT NULL,
 `lat` varbinary(50) NOT NULL,
 `latnonce` varbinary(25) NOT NULL,
 `lng` varbinary(50) NOT NULL,
 `lngnonce` varbinary(25) NOT NULL,
 `place` tinyblob NOT NULL,
 `placenonce` tinyblob NOT NULL,
 UNIQUE KEY `id` (`id`),
 KEY `lat` (`lat`),
 KEY `lng` (`lng`)
) 

CREATE TABLE `altlocation` (
 `id` int(5) NOT NULL,
 `eid` int(5) NOT NULL,
 `altlat` varbinary(50) NOT NULL,
 `altlatnonce` varbinary(25) NOT NULL,
 `altlng` varbinary(50) NOT NULL,
 `altlngnonce` varbinary(25) NOT NULL,
 `altplace` tinyblob NOT NULL,
 `altplacenonce` tinyblob NOT NULL,
 UNIQUE KEY `eid` (`eid`),
 KEY `altlat` (`altlat`),
 KEY `altlng` (`altlng`)
)

sidenote: the engine for entrys should probably be innodb, having ~70% read. The location tables both running with innodb.

EDIT question to Willem Renzema for his answer:

Would it be more efficient like that?

SELECT 
        e.id, e.uid, e.title, e.description, l.place, l.placenonce, al.altplace, al.altplacenonce,
        IF(e.altloc=0,
            6371 * acos( cos( radians(:lat) ) * cos( radians( AES_DECRYPT(lat, UNHEX('###'), latnonce) ) ) * cos( radians( AES_DECRYPT(lng, UNHEX('###'), lngnonce) ) - radians(:lng) ) + sin( radians(:lat) ) * sin(radians(AES_DECRYPT(lat, UNHEX('###'), latnonce))) ) ,
            6371 * acos( cos( radians(:lat) ) * cos( radians( AES_DECRYPT(altlat, UNHEX('###'), altlatnonce) ) ) * cos( radians( AES_DECRYPT(altlng, UNHEX('###'), altlngnonce) ) - radians(:lng) ) + sin( radians(:lat) ) * sin(radians(AES_DECRYPT(altlat, UNHEX('###'), altlatnonce))) )
        ) AS distance
    FROM 
        (
            SELECT id, uid, title, description
            FROM 
                entrys 
            WHERE 
                    approx_lat > :min_lat
                AND approx_lat < :max_lat
                AND approx_lng > :min_lng
                AND approx_lng < :min_lng   
            ORDER BY 
                e.tmstmp 
            DESC
            LIMIT 2
            
        ) AS e
    INNER JOIN 
        location l 
    ON l.id = uid 
    LEFT JOIN
        altlocation al
    ON al.eid = e.id
    HAVING 
        distance <= 50

If I would add approx_lat and approx_lng to the entry table. The clue would be moving approx_lat and approx_lng to the entry table, that I could insert altlocation OR location only, so I could get rid off IF inside the query.

Is HAVING distance <= 50still necessary?


Solution

  • Use a bounding box in your query.

    Example (only changes are in the WHERE clause):

    SELECT 
        e.id, e.uid, e.title, e.description, l.place, l.placenonce, al.altplace, al.altplacenonce,
        IF(e.altloc=0,
            6371 * acos( cos( radians(:lat) ) * cos( radians( AES_DECRYPT(lat, UNHEX('###'), latnonce) ) ) * cos( radians( AES_DECRYPT(lng, UNHEX('###'), lngnonce) ) - radians(:lng) ) + sin( radians(:lat) ) * sin(radians(AES_DECRYPT(lat, UNHEX('###'), latnonce))) ) ,
            6371 * acos( cos( radians(:lat) ) * cos( radians( AES_DECRYPT(altlat, UNHEX('###'), altlatnonce) ) ) * cos( radians( AES_DECRYPT(altlng, UNHEX('###'), altlngnonce) ) - radians(:lng) ) + sin( radians(:lat) ) * sin(radians(AES_DECRYPT(altlat, UNHEX('###'), altlatnonce))) )
        ) AS distance
    FROM 
        entrys e 
    INNER JOIN 
        location l 
        ON l.id = e.uid 
    LEFT JOIN
        altlocation al
        ON al.eid = e.id
    WHERE 
        e.country = :countryid
        AND l.approx_lat > :min_lat
        AND l.approx_lat < :max_lat
        AND l.approx_lng > :min_lng
        AND l.approx_lng < :min_long    
    HAVING 
        distance <= 50
    ORDER BY 
        e.tmstmp 
    DESC
    LIMIT 2
    

    You would compute the :min_lat, :max_lat, :min_lng, and :max_lng before you execute the query. Those values would be generated from the desired radius from your :lat and :lng values (in this case, 50).

    How exactly to do that I suggest reading one of the many other answers, such as this one, that are all over the internet. Just search for geolocation bounding box to get started.

    Then, you can further improve performance by adding an index on the approx_lat and approx_lng columns. You can also try adding a couple composite indexes, of (approx_lat,approx_lng) and/or (approx_lng,approx_lat), as the optimizer MAY be able to make use of those. However, those are things I would strongly advise benchmarking to see if they provide any improvement or not. Additional columns to make these covering indexes may also help, but I'm focusing on the most basic issues at the moment.

    Note that what you are trying to optimize is already a difficult optimization problem. The fact that you need to encrypt your data, makes it even harder. However, so long as you can store these approximate values, we can bypass most of that extra difficulty.

    I would also strongly advice you to keep IF logic out of your WHERE clause. By including that, you force the optimizer to look up every single record to see if it matches that condition or not.

    In general, to get good performance, you need to limit the number of records that need to be checked. The IF statement cannot be optimized (it is not sargable). This is also why my answer requires you to store approximate values in order to be effective. If the data has to be decrypted first, that means that every single record has to be looked up and checked. That is what is going to kill your performance.

    Also note that in my example query I ignored the altlocation table in the WHERE clause. Ideally, if location and altlocation are identical, you should have only a single table for the data, and then JOIN to that table on the from a record that records a location id as being a primary or an "alternate".

    I hope this can at least help you on the right direction.