ruby-on-railselasticsearchelasticsearch-rails

Function Score attribute to rank searches based on clicks not working with elastic search and rails


I have implemented the function score attribute in my document model which contains a click field that keeps tracks of a number of view per document. Now I want the search results to get more priority and appear at the top based on the clicks per search

My document.rb code

require 'elasticsearch/model'



 def self.search(query)
  __elasticsearch__.search(
    {
      query: {
        function_score: {
          query: {
            multi_match: {
              query: query,
              fields: ['name', 'service'],
              fuzziness: "AUTO"
            }
          },
          field_value_factor: {
            field: 'clicks',
            modifier: 'log1p',
            factor: 2 
          }
        }
      }
    }
  )
 end

 settings index: { "number_of_shards": 1, 
  analysis: {
    analyzer: {
      edge_ngram_analyzer: { type: "custom", tokenizer: "standard", filter: 
                       ["lowercase", "edge_ngram_filter", "stop", "kstem" ] },
        }
    },
    filter: { ascii_folding: { type: 'asciifolding', preserve_original: true
                             }, 
              edge_ngram_filter: { type: "edgeNGram", min_gram: "3", max_gram:
                              "20" } 
  }
 } do
  mapping do
    indexes :name, type: "string", analyzer: "edge_ngram_analyzer", 
             term_vector: "with_positions"
    indexes :service, type: "string", analyzer: "edge_ngram_analyzer", 
             term_vector: "with_positions"
  end 
 end

end

Search View is here

<h1>Document Search</h1>

 <%= form_for search_path, method: :get do |f| %>
 <p>
  <%= f.label "Search for" %>
  <%= text_field_tag :query, params[:query] %>
  <%= submit_tag "Go", name: nil %>
 </p>
<% end %>
<% if @documents %>
  <ul class="search_results">
    <% @documents.each do |document| %>
    <li>
       <h3>
          <%= link_to document.name, controller: "documents", action: "show", 
         id: document._id %>   
       </h3>   
   </li>
   <% end %>
 </ul>
<% else %>
 <p>Your search did not match any documents.</p>
<% end %>
 <br/>

When I search for Estamp, I get the results follow in the following order:

 Franking and Estamp # clicks 5
 Notary and Estamp   #clicks 8

So clearly when the Notary and Estamp had more clicks it does not come to the top of the search.How can I achieve this?

This is what I get when I run it on the console.

POST _search

      "hits": {
       "total": 2,
       "max_score": 1.322861,
       "hits": [
             {
              "_index": "documents",
              "_type": "document",
              "_id": "13",
              "_score": 1.322861,
              "_source": {
                 "id": 13,
                 "name": "Franking and Estamp",
                 "service": "Estamp",
                 "user_id": 1,         
                 "clicks": 7
              },
           {
              "_index": "documents",
              "_type": "document",
              "_id": "14",
              "_score": 0.29015404,
              "_source": {
                "id": 14,
                "name": "Notary and Estamp",
                "service": "Notary",
                "user_id": 1,
                "clicks": 12
         }
       }
     ]

Here the score of the documents is not getting updated based on the clicks


Solution

  • Without seeing your indexed data it's not easy to answer. But looking at the query one thing comes to my mind, I'll show it with short example:

    Example 1:

    I've indexed following documents:

    {"name":"Franking and Estampy", "service" :"text", "clicks": 5}
    {"name":"Notary and Estamp", "service" :"text", "clicks": 8}
    

    Running the same query you provided gave this result:

    "hits": {
        "total": 2,
        "max_score": 4.333119,
        "hits": [
            {
                "_index": "script",
                "_type": "test",
                "_id": "AV2iwkems7jEvHyvnccV",
                "_score": 4.333119,
                "_source": {
                    "name": "Notary and Estamp",
                    "service": "text",
                    "clicks": 8
                }
            },
            {
                "_index": "script",
                "_type": "test",
                "_id": "AV2iwo6ds7jEvHyvnccW",
                "_score": 3.6673431,
                "_source": {
                    "name": "Franking and Estampy",
                    "service": "text",
                    "clicks": 5
                }
            }
        ]
    }
    

    So everything is fine - document with 8 clicks got higher scoring (_score field value) and the order is correct.

    Example 2:

    I noticed in your query that name field is boosted with high factor. So what would happen if I had following data indexed?

    {"name":"Franking and Estampy", "service" :"text", "clicks": 5}
    {"name":"text", "service" :"Notary and Estamp", "clicks": 8}
    

    And result:

    "hits": {
        "total": 2,
        "max_score": 13.647502,
        "hits": [
            {
                "_index": "script",
                "_type": "test",
                "_id": "AV2iwo6ds7jEvHyvnccW",
                "_score": 13.647502,
                "_source": {
                    "name": "Franking and Estampy",
                    "service": "text",
                    "clicks": 5
                }
            },
            {
                "_index": "script",
                "_type": "test",
                "_id": "AV2iwkems7jEvHyvnccV",
                "_score": 1.5597181,
                "_source": {
                    "name": "text",
                    "service": "Notary and Estamp",
                    "clicks": 8
                }
            }
        ]
    }
    

    Although Franking and Estampy has only 5 clicks, it has much much higher scoring than the second document with greater number of clicks.

    So the point is that in your query, the number of clicks is not the only factor that has an impact on scoring and final order of documents. Without the real data it's only a guess from my side. You can run the query yourself with some REST client and check scoring/field/matching phrases.

    Update

    Based on your search result - you can see that document with id=13 has Estamp term in both fields (name and service). That is the reason why this document got higer scoring (it means that in the algorithm of calculating scoring it is more important to have the term in both fields than have higher number of clicks). If you want clicks field to have bigger impact on the scoring, try to experiment with factor (probably should be higher) and modifier ("modifier": "square" could work in your case). You can check possible values here.

    Try for example this combination:

    {
      "query": {
        "function_score": { 
          ... // same as before
          },
          "field_value_factor": { 
            "field": "clicks" ,
            "modifier": "square",
            "factor": 3 
          }
        }
      }
    }
    

    Update 2 - scoring based only on number of clicks

    If the only parameter that should have an impact on scoring should be the value in clicks field, you can try to use "boost_mode": "replace" - in this case only function score is used, the query score is ignored. So the frequency of Estamp term in name and service fields will have no impact on the scoring. Try this query:

    {
      "query": {
        "function_score": { 
          "query": { 
            "multi_match": {
              "query":    "Estamp",
              "fields": [ "name", "service"],
              "fuzziness": "AUTO"
            }
          },
          "field_value_factor": { 
            "field": "clicks",
            "factor": 1
          },
          "boost_mode": "replace"
        }
      }
    }
    

    It gave me:

    {
        "took": 2,
        "timed_out": false,
        "_shards": {
            "total": 1,
            "successful": 1,
            "failed": 0
        },
        "hits": {
            "total": 2,
            "max_score": 5,
            "hits": [
                {
                    "_index": "script",
                    "_type": "test",
                    "_id": "AV2nI0HkJPYn0YKQxRvd",
                    "_score": 5,
                    "_source": {
                        "name": "Notary and Estamp",
                        "service": "Notary",
                        "clicks": 5
                    }
                },
                {
                    "_index": "script",
                    "_type": "test",
                    "_id": "AV2nIwKvJPYn0YKQxRvc",
                    "_score": 4,
                    "_source": {
                        "name": "Franking and Estamp",
                        "service": "Estamp",
                        "clicks": 4
                    }
                }
            ]
        }
    }
    

    This may be the one you are looking for (note the values "_score": 5 and "_score": 4 are matching the number of clicks).