phpmysqlexact-matchboolean-search

PHP/MYSQL - Boolean Full-Text Searches - exact string operator ("") gives different result sets in Prepared Statement vs SQL Query


I am making an advanced search page on my website using PHP prepared statements to query a MySQL database MyISAM table using a Boolean Full-Text search.

It is very much like the first section of the Advanced Search provided by Google and covers the following criteria:

I retrieve each input value, clean and process each part of the collection of strings, then append the appropriate information to form the mySQL query via the prepared statement.

So in essence for the following search:

Would output as this string:

seagate* toshiba*

The query would result in something like this:

SELECT id, description
FROM `items` 
WHERE MATCH (description)
AGAINST ('seagate* toshiba*' IN BOOLEAN MODE)

Which would list all rows with the words "seagate" followed by anything and "toshiba" followed by anything in the description field.

This works fine, as does an output of:

-(750gb*) -(320gb*) seagate* toshiba*

Which would list all rows as above but exclude any rows with "750gb" and "320gb" in the description field.

By adding a value to the "All of these words" string we would get an output of:

+(16mb*) +(7200rpm*) -(750gb*) -(320gb*) seagate* toshiba*

Which would list all rows as above but show only downs that included both "16mb" and "7200rpm" in the description field.

Now for the problematic part. If i were to make use of the "Exact word of statement" string and add the value "serial ata 600" we would get an output of:

+(16mb*) +(7200rpm*) -(750gb*) -(320gb*) +("serial ata 600") seagate* toshiba*

Running this string and resulting query through as a sql query using phpmyadmin I get a result set of 2 rows matching the criteria for the search.

However, when running this on my website i get a result of 6 rows which would indicate that +("serial ata 600")" is being ignored entirely.

If i enter only a value for the string "Exact word of statement" so that we would get an output of:

+("serial ata 600")

The results would indicate that this string would list all rows that contained either "serial" or "ata" or "600".

By running the same query directly in mysql this result would list all rows that contained exactly the words "serial ata 600".

In the MySQL definition of this operator it states:

A phrase that is enclosed within double quote (“"”) characters matches 
only rows that contain the phrase literally, as it was typed.

This is the case in MySQL but when running the same query with PHP as a Prepared Statement returns a different result set.

Here is the the prepared statement:

if ($result = $link->prepare("
    SELECT id, description
    FROM `items` 
    WHERE MATCH (description)
    AGAINST (? IN BOOLEAN MODE)
"))
{
    $result->bind_param("s", $pattern);
    ... ETC
}

Here is the output of $pattern directly before this:

+("serial ata 600")

Could anyone possibly suggest the reason for this behavior as i do not see any reason for there to be any difference in the way things work here between PHP and MySQL.

I can provide any additional code relating to how the string is generated on request but the output is as it is in my example.

Any suggestions/advice/input/feedback or comment would be greatly appreciated.


Solution

  • This is one place where prepared statements fall flat on their faces. Internally, the preparation engine will be doing the equivalent of:

    $quoted = mysql_real_escape_string('+("serial ata 600")');
    

    which gives you the equivalent of

    +(\"serial ata 600\")
    

    Now you're not using a 3-word quote phrase anymore, you're sending in the following SEPARATE words:

    +("serial
    
    ata
    
    600")
    

    This is because the " quotes are SQL metacharacters, and you NEED them to be treated as metacharacters. However, because they ARE metacharacters, the prep engine will quote them, reducing them to normal plain-jane quotes, and now they're no longer enclosing your search phrase. They've become PART of the search phrase.

    Don't know if this would actually work, but you may have to rewrite the prepared statement to be more like

    ... MATCH AGAINST (CONCAT('("', ?, '")'))