ruby-on-rails-4elasticsearchelasticsearch-railselasticsearch-model

Elasticsearch not working with 'not_analyzed' index


I am unable to figure out why elasticsearch not searching with not_analysed indexes. I have following settings in my model,

settings index: { number_of_shards: 1 } do
      mappings dynamic: 'false' do
        indexes :id
        indexes :name, index: 'not_analyzed'
        indexes :email, index: 'not_analyzed'
        indexes :contact_number
      end
    end

    def as_indexed_json(options = {})
      as_json(only: [ :id, :name, :username, :user_type, :is_verified, :email, :contact_number ])
    end

And my mapping at elasticsearch is right, as below.

{
  "users-development" : {
    "mappings" : {
      "user" : {
        "dynamic" : "false",
        "properties" : {
          "contact_number" : {
            "type" : "string"
          },
          "email" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "id" : {
            "type" : "string"
          },
          "name" : {
            "type" : "string",
            "index" : "not_analyzed"
          }
        }
      }
    }
  }
}

But issue is when I make search on not analyzed fields (name and email, as I wanted them to be not analyzed) it only search on full word. Like in the example below it should have return John, Johny and Tiger, all 3 records. But it only returns 2 of the records.

I am searching as below

  settings = {
    query: {
      filtered: {
        filter: {
          bool: {
            must: [
              { terms: { name: [ "john", "tiger" ] } },
            ]
          }
        }
      }
    },
    size: 10
  }

  User.__elasticsearch__.search(settings).records

This is how I am creating index on my user object in callback after_save,

User.__elasticsearch__.client.indices.create(
                index: User.index_name,
                id: self.id,
                body: self.as_indexed_json,
              )

Some of the document that should match

[{
      "_index" : "users-development",
      "_type" : "user",
      "_id" : "670",
      "_score" : 1.0,
      "_source":{"id":670,"email":"john@monkeyofdoom.com","name":"john baba","contact_number":null}
    },
    {
          "_index" : "users-development",
          "_type" : "user",
          "_id" : "671",
          "_score" : 1.0,
          "_source":{"id":671,"email":"human@monkeyofdoom.com","name":"Johny Rocket","contact_number":null}
        }

    , {
          "_index" : "users-development",
          "_type" : "user",
          "_id" : "736",
          "_score" : 1.0,
          "_source":{"id":736,"email":"tiger@monkeyofdoom.com","name":"tiger sherof", "contact_number":null}
        } ]

Any suggestions please.


Solution

  • I think you would get desired results with keyword toknizer combined with lowercase filter rather than using not_analyzed.

    The reason john* did not match Johny was due to case sensitivity. This setup will work

    {
      "settings": {
        "analysis": {
          "analyzer": {
            "keyword_analyzer": {
              "type": "custom",
              "filter": [
                "lowercase"
              ],
              "tokenizer": "keyword"
            }
          }
        }
      },
      "mappings": {
        "my_type": {
          "properties": {
            "name": {
              "type": "string",
              "analyzer": "keyword_analyzer"
            }
          }
        }
      }
    }
    

    Now john* will match johny. You should be using multi-fields if you have various requirements. terms query for john wont give you john baba as inside inverted index there is no token as john. You could use standard analyzer on one field and keyword analyzer on other.