ruby-on-railselasticsearchelasticsearch-rails

Spell check Ngram for elastic Search not working with rails


I have used in my model to include spell check such that if the user inputs data like "Rentaal" then it should fetch the correct data as "Rental"

document.rb code

require 'elasticsearch/model'

class Document < ApplicationRecord
  include Elasticsearch::Model
  include Elasticsearch::Model::Callbacks
  belongs_to :user

  Document.import force: true


  def self.search(query)
  __elasticsearch__.search({
      query: {
        multi_match: {
          query: query,
          fields: ['name^10', 'service']
      }
    }
    })
  end


  settings index: { 
    "number_of_shards": 1, 
    analysis: {
      analyzer: {
        edge_ngram_analyzer: { type: "custom", tokenizer: "standard", filter: 
          ["lowercase", "edge_ngram_filter", "stop", "kstem" ] },
            }
        },
        filter: {
                  edge_ngram_filter: { type: "edgeNGram", min_gram: "3", max_gram: 
                  "20" } 
      }
    } do
    mapping do
      indexes :name, type: "string", analyzer: "edge_ngram_analyzer"
      indexes :service, type: "string", analyzer: "edge_ngram_analyzer"
    end 
  end
end

search controller code:

def search
  if params[:query].nil?
    @documents = []
  else
    @documents = Document.search params[:query]
  end
end

However, if I enter Rentaal or any misspelled word, it does not display anything. In my console

     @documents.results.to_a 

gives an empty array.

What am I doing wrong here? Let me know if more data is required.


Solution

  • Try to add fuzziness in your multi_match query:

    {
          "query": {
            "multi_match": {
              "query": "Rentaal",
              "fields": ["name^10", "service"],
              "fuzziness": "AUTO"
          }
        }
    }
    

    Explanation

    Kstem filter is used for reducing words to their root forms and it does not work as you expected here - it would handle corectly phrases like Renta or Rent, but not the misspelling you provided.

    You can check how stemming works with following query:

    curl -X POST \
      'http://localhost:9200/my_index/_analyze?pretty=true' \
      -d '{
      "analyzer" : "edge_ngram_analyzer",
      "text" : ["rentaal"]
    }'
    

    As a result I see:

    {
        "tokens": [
            {
                "token": "ren"
            },
            {
                "token": "rent"
            },
            {
                "token": "renta"
            },
            {
                "token": "rentaa"
            },
            {
                "token": "rentaal"
            }
        ]
    }
    

    So typical misspelling will be handled much better with applying fuzziness.