phpmysql

How Long is Acceptable to Run a mySQL Query


When I test this query it takes around 17 - 20 seconds to complete.

    UPDATE ex_hotel_temp
SET specialoffer='1'
WHERE hid IN
    (SELECT hid
     FROM ex_dates
     WHERE offer_id IS NOT NULL
       OR xfory_id IS NOT NULL
       OR long_id IS NOT NULL
       OR early_id IS NOT NULL
     GROUP BY hid)

Although this is a cronjob running at night to do some housekeeping on the database (there is no site visitor sitting waiting for the result), it seems to me to be an unacceptable load on the server. Am I right, or am I fussing over nothing?

When I run each element of the query individually it takes about 0.001 sec. Should I therefore break it up into a series of simple queries instead?

LATER EDIT: With the assistance of the comments and answers received, I decided to split the query into two. The result is this:

$query_hotel = "SELECT hid FROM ex_dates WHERE offer_id IS NOT NULL OR xfory_id IS NOT NULL OR long_id IS NOT NULL OR early_id IS NOT NULL GROUP BY hid";
$hotel = mysql_query($query_hotel, $MySQL_XXX) or die(mysql_error());
$row_hotel = mysql_fetch_assoc($hotel);
$totalRows_hotel = mysql_num_rows($hotel);

$hid_array = array();
do {
    array_push($hid_array,$row_hotel['hid']);
}while ($row_hotel = mysql_fetch_assoc($hotel)) ;
$hid_list = implode("','",$hid_array);
$hid_list = "'$hid_list'";

// Mark the hotels as having a special offer 
$query_update = "UPDATE ex_hotel_temp SET specialoffer='1' WHERE hid IN ($hid_list)";
$result = mysql_query($query_update, $MySQL_XXX) or die(mysql_error());

It's not pretty, but it works.

As there are two queries with a bit of PHP thrown in, I can't get an accurate measure of how long it takes to run, but just by looking at the time for the page to load it is obviously much closer to the fractions of a second than 20 seconds.

Thanks to all.


Solution

  • You say that this runs overnight in a CRON job, and you say this supports a "site" - if this is a public-facing website, yes, you should worry.

    There's no such thing as business hours on the interwebs - there will be visitors interacting with your website, hopefully trying to buy stuff, at all hours of the day; even "national" sites tend to see traffic through the night in my experience (though typically at only a small rate compared to peak hours).

    It's possible that your CRON job is causing other queries to run slowly too - it depends on what's causing the query to run slowly, and whether you're using transactions. The problem with web sites is that users tend to get impatient when the site is slow, refreshing the page, often creating more traffic to the database, and if there are other slow queries on the site, it's not impossible that the site becomes unusable for a while, even with a fairly limited number of users.

    So, if there may be users of your site while the script runs, it's definitely worth tidying up.

    The other reason you might worry is that in my experience, database performance is not linear - queries don't slow down in linear proportion to the number of records in your table. Instead, they tend to be hockey-stick like - everything is fine, until you reach a tipping point, and everything grinds to a halt. You may be riding that hockey-stick curve, and it could easily escalate from 17-20 seconds to 17-20 minutes.

    The fix looks simple - the group by is redundant, and splitting the query into smaller queries should help the subselect use indices.