phpmysqlsymfonygeographic-distance

Calculating distance between Geo points for large datasets


I'm building an online Symfony application, and as part of the developement process i've been tasked with sorting an amount of database records based on the distance from the logged user; said user can widen the search radius at will, up to the size of the whole world.

At any moment i have access to the GPS coordinates of the logged user, and on a database table i saved latitude and longitude of the various points of interest.

Currently, there are only 400 records in the POIs' table, but due to the amount of data i must extract whenever i access it, the query time is already slightly over a second. Adding 400 trigonometric functions to such workload will soon bring such execution time beyond acceptability.

I thus need a method that's both fast and accurate to calculate such distances;

I've read multiple articles suggesting the Haversine formula, but i found that to be too slow for my needs and even an extensive article like this couldn't be of any help;

Considering that i could soon reach thousands of POIs with thousands of users logged at the same time from all over the world, how could i approach (and hopefully solve) such problem?

I'm using PHP 7.0, Symfony 3.2, and Doctrine; pdo to interface to a Mysql server, with innoDB as the database engine
My customer values accuracy over speed, but can't stand to wait more than 5 seconds
The query results are paged, thus delegating the sorting to the client is impossible
Both the database and the php server share the same (terrible) resource pool, and such pool is to be shared with other applications

On a sidenote, some of the POIs may expire after a certain date


Solution

  • You asked me to add it so I will.

    Are you sure the performance hit is from Haversine? We have used a PHP implementation of this formula in production at my work successfully for about 2 years and we do a high volume of searches ( around 150k a minute at peak times ).

    I cant get into the details about my work but i can say we use a combination of sphinx, mongoDB, mysql and RabbitMq.

    In any case, both sphinx and mysql suffer from a poor implementation of distance calculations losing about 2 miles in accuracy at 100 miles distance.( this is why we use it )

    One thing you can do is to benchmark the time it takes to run the Haversine formula, good benchmarking is the first step when you have performance issues.

    While i am not a symphony user, I do have a class i made for just this thing. It's part of a larger framework I am building in my spare time (Evolution). You can get the class here

    https://github.com/ArtisticPhoenix/Evo/blob/master/Evo/Benchmark.php

    It's very simple to use

    $mark = Benchmark::getInstance()->mark();
    
    ... code to time ...
    
    echo Benchmark::getInstance()->format($mark);
    

    And will output something like

    10 milliseconds
    5 minutes 3 milliseconds
    ect..
    

    It's designed so you can use multiple marks

    $mark = Benchmark::getInstance()->mark();
    
    ... code to time ...
    
    $mark1 = Benchmark::getInstance()->mark();
    
     ... more code to time ...
    
    echo "TotalTime: ".Benchmark::getInstance()->format($mark);
    echo "MethodTime: ".Benchmark::getInstance()->format($mark1);
    
    etc..
    

    It basically just records the microtime(true) (true is as float) when you call mark() and returns a identifier $mark then if you call mark($mark) with the identifier it will subtract that from the current microtime(true). Calling format($mark) just makes it more "Human" readable.

    Hope it helps you!