phpzend-frameworkzend-search-lucene

How to control scoring and ordering in Zend_Search_Lucene, so one field is more important than the other?


From what I understand, after reading documentation (especially scoring part), every field I add has the same level of importance when scoring searched results. I have following code:

protected static $_indexPath = 'tmp/search/indexes/projects';

public static function createSearchIndex()
{
    $_index = new Zend_Search_Lucene(APPLICATION_PATH . self::$_indexPath, true);

    $_projects_stmt = self::getProjectsStatement();
    $_count = 0;

    while ($row = $_projects_stmt->fetch()) {

        $doc = new Zend_Search_Lucene_Document();

        $doc->addField(Zend_Search_Lucene_Field::text('name', $row['name']));
        $doc->addField(Zend_Search_Lucene_Field::text('description', $row['description']));
        $doc->addField(Zend_Search_Lucene_Field::unIndexed('projectId', $row['id']));

        $_index->addDocument($doc);
    }

    $_index->optimize();
    $_index->commit();
}

The code is simple - I'm generating index, based on data fetched from db, and save it in the specified location.

I was looking in many places, as my desired behavior is that name field is more important than description (let's say 75% and 25%). So when I will search for some phrase, and it will be found in description of the first document, and in name of the second document, then second document will in fact have 3 times bigger score, and will show up higher on my list.

Is there any way to control scoring/ordering in this way?


Solution

  • I found it out basing on this documentation page. You need to create new Similarity algorithm class, and overwrite lengthNorm method. I copied this method from Default class, added $multiplier variable, and set it's value when needed (for a column I want):

    class Zend_Search_Lucene_Search_Similarity_Projects extends Zend_Search_Lucene_Search_Similarity_Default
    {
        /**
         * @param string $fieldName
         * @param integer $numTerms
         * @return float
         */
        public function lengthNorm($fieldName, $numTerms)
        {
            if ($numTerms == 0) {
                return 1E10;
            }
    
            $multiplier = 1;
    
            if($fieldName == 'name') {
                $multiplier = 3;
            }
    
            return 1.0/sqrt($numTerms / $multiplier);
        }
    }
    

    Then the only thing you need to do (edit of code from question) is set your new Similarity algorithm class as a default method just before indexing:

    protected static $_indexPath = 'tmp/search/indexes/projects';
    
    public static function createSearchIndex()
    {
        Zend_Search_Lucene_Search_Similarity::setDefault(new Zend_Search_Lucene_Search_Similarity_Projects());
    
        $_index = new Zend_Search_Lucene(APPLICATION_PATH . self::$_indexPath, true);
    
        $_projects_stmt = self::getProjectsStatement();
        $_count = 0;
    
        while ($row = $_projects_stmt->fetch()) {
    
            $doc = new Zend_Search_Lucene_Document();
    
            $doc->addField(Zend_Search_Lucene_Field::text('name', $row['name']));
            $doc->addField(Zend_Search_Lucene_Field::text('description', $row['description']));
            $doc->addField(Zend_Search_Lucene_Field::unIndexed('projectId', $row['id']));
    
            $_index->addDocument($doc);
        }
    
        $_index->optimize();
        $_index->commit();
    }
    

    I wanted to extra boost name field, but you can do it with anyone.