elasticsearchshingles

Elasticsearch shingle token filter not working


I'm trying this on a local 1.7.5 elasticsearch installation

http://localhost:9200/_analyze?filter=shingle&tokenizer=keyword&text=alkis stack

I see this

{
   "tokens":[
      {
         "token":"alkis stack",
         "start_offset":0,
         "end_offset":11,
         "type":"word",
         "position":1
      }
   ]
}

And I expected to see something like this

{
   "tokens":[
      {
         "token":"alkis stack",
         "start_offset":0,
         "end_offset":11,
         "type":"word",
         "position":1
      },
      {
         "token":"stack alkis",
         "start_offset":0,
         "end_offset":11,
         "type":"word",
         "position":1
      }
   ]
}

Am I missing something?

Update

{
  "number_of_shards": 2,
  "number_of_replicas": 0,
  "analysis": {
    "char_filter": {
      "map_special_chars": {
        "type": "mapping",
        "mappings": [
          "- => \\u0020",
          ". => \\u0020",
          "? => \\u0020",
          ", => \\u0020",
          "` => \\u0020",
          "' => \\u0020",
          "\" => \\u0020"
        ]
      }
    },
    "filter": {
      "permutate_fullname": {
        "type": "shingle",
        "max_shingle_size": 4,
        "min_shingle_size": 2,
        "output_unigrams": true,
        "token_separator": " ",
        "filler_token": "_"
      }
    },
    "analyzer": {
      "fullname_analyzer_search": {
        "char_filter": [
          "map_special_chars"
        ],
        "filter": [
          "asciifolding",
          "lowercase",
          "trim"
        ],
        "type": "custom",
        "tokenizer": "keyword"
      },
      "fullname_analyzer_index": {
        "char_filter": [
          "map_special_chars"
        ],
        "filter": [
          "asciifolding",
          "lowercase",
          "trim",
          "permutate_fullname"
        ],
        "type": "custom",
        "tokenizer": "keyword"
      }
    }
  }
}

And I'm trying to test like this

http://localhost:9200/INDEX_NAME/_analyze?analyzer=fullname_analyzer_index&text=alkis stack

Solution

  • Index first name and last name in two separate fields in ES, just as you have them in the DB. The text received as query can be analyzed (match does it for example, query_string does it). And there are ways to search both fields at the same time with all the terms in the search string. I think you are over-complicating the use case with single name in one go and creating names permutations at indexing time.