mysqlmatchagainst

How to eliminate bias against shorter rows in MATCH/AGAINST?


I am working on a simple search interface in a MyISAM table in MySQL, that is implementing the MATCH/AGAINST procedures.

It seems to work alright at first glance, but upon further inspection, it appears to have a bias towards shorter row length. I can only imagine this is because the score it is given must be higher, because the percentage of words matched is higher.

Here is the query to the MySQL database that I am using, and the results are from the application in the screenshot down below.

SELECT 
            report, 
            status,
            GROUP_CONCAT(DISTINCT status) AS statuses, 
            GROUP_CONCAT(DISTINCT docID) AS docIDs, 
            GROUP_CONCAT(DISTINCT analyst) AS analysts, 
            GROUP_CONCAT(DISTINCT region) AS regions, 
            GROUP_CONCAT(DISTINCT country) AS countries, 
            GROUP_CONCAT(DISTINCT topic) AS topics, 
            GROUP_CONCAT(DISTINCT date) AS dates, 
            MAX(date) AS date,
            MIN(date) AS mindate,
            MAX(docID) AS docID, 
            GROUP_CONCAT(DISTINCT event) AS events, 
            GROUP_CONCAT(DISTINCT rule) AS rules, 
            GROUP_CONCAT(DISTINCT link SEPARATOR ' ') AS links, 
            GROUP_CONCAT(DISTINCT province) AS provinces,
            MATCH (
                region, country, province, topic, event
            )
            AGAINST (
                'toxic china'
            ) AS score
            FROM search_reports
            GROUP BY report
            ORDER BY score DESC

For simplicity's sake, I have just left in the AGAINST as a constant while I am working out this issue. Currently it is set to only search for 'toxic china'. Thus it is unexpected that some results that don't contain China are being ranked higher than those that do contain that particular search keyword.

Search Results


Solution

  • You may want to try IN BOOLEAN MODE like so:

    AGAINST (
            'toxic china' IN BOOLEAN MODE
    )
    

    as this should just be a true / false match on the term