elasticsearchmappingpartialsetting

ElasticSearch Partial Mappings with Spaces


My Partial mappings and queries work great until a space is involved, For Example the term Jon Doe breaks down its term vector to ..

"terms": {
            "j": {
               "term_freq": 1
            },
            "jo": {
               "term_freq": 1
            },
            "jon": {
               "term_freq": 1
            },
            "d": {
               "term_freq": 1
            },
            "do": {
               "term_freq": 1
            },
            "doe": {
               "term_freq": 1
            }
         }

But I would like it to be ..

"terms": {
            "j": {
               "term_freq": 1
            },
            "jo": {
               "term_freq": 1
            },
            "jon": {
               "term_freq": 1
            },
            "jon ": {
               "term_freq": 1
            },
            "jon d": {
               "term_freq": 1
            },
            "jon do": {
               "term_freq": 1
            },
            "jon doe": {
               "term_freq": 1
            }
         }

Here are my mappings and settings:

Mappings:

   name: {
    type: 'string',
    term_vector: 'yes',
    analyzer: 'ngram_analyzer',
    search_analyzer: 'standard',
    include_in_all: true
  }

Settings:

settings: {
    index: {
      analysis: {
        filter: {
          ngram_filter: {
            type: 'edge_ngram',
            min_gram: 1,
            max_gram: 15
          }
        },
        analyzer: {
          'ngram_analyzer': {
            filter: [
              'lowercase',
              'ngram_filter'
            ],
            type: 'custom',
            tokenizer: 'standard'
          }
        }
      },
      number_of_shards: 1,
      number_of_replicas: 1
    }
  }
};

How would I go about this?


Solution

  • You just need to use a different tokenizer in your custom analyzer:

        "analyzer": {
          "ngram_analyzer": {
            "filter": [
              "lowercase",
              "ngram_filter"
            ],
            "type": "custom",
            "tokenizer": "keyword"
          }
        }