elasticsearchpyes

Elasticsearch search for plurals


I can't seem to figure out how to get elasticsearch (accessed via pyes) to search plural/singular terms. For instance, when I enter Monkies, I'd like to get results back that have Belt. I've looked at Elasticsearch not returning singular/plural matches but can't seem to make sense of it. Here's some curl statements

curl -XDELETE localhost:9200/myindex

curl -XPOST localhost:9200/myindex -d '
{"index": 
  { "number_of_shards": 1,
    "analysis": {
       "filter": {
                "myfilter": {
                    "type" : "porter_stem",
                    "language" : "English"
                }
                 },
       "analyzer": {
             "default" : {                    
                 "tokenizer" : "nGram",
                 "filter" : ["lowercase", "myfilter"]
              },
             "index_analyzer" : {                    
                 "tokenizer" : "nGram",
                 "filter" : ["lowercase", "myfilter"]
              },
              "search_analyzer" : {                                                    
                  "tokenizer" : "nGram",
                  "filter" : ["lowercase", "myfilter"]
              }
        }
     }
  }
}
}'

curl -XPUT localhost:9200/myindex/mytype/_mapping -d '{
    "tweet" : {
        "date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy"],
        "properties" : {
            "user": {"type":"string"},
            "post_date": {"type": "date"},
            "message" : {"type" : "string", "analyzer": "search_analyzer"}
        }
    }}'

curl -XPUT 'http://localhost:9200/myindex/mytype/1' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "belt knife is a cool thing"
}'

curl -XPUT 'http://localhost:9200/myindex/mytype/2' -d '{
"user" : "alwild",
"post_date" : "2009-11-15T14:12:12",
"message" : "second message with nothing else"
}'

curl -XGET localhost:9200/myindex/mytype/_search?q=message:belts

I've got it to the point where searching for belts give me some results...but now it gives too many results. What do I have to do to get it to return only that one entry that has "belt" in it?


Solution

  • By default, your query is executed against the _all field, which uses the standard analyzer, and thus you have no stemming. Try searching with a query such as name:Monkies. For production purposes, use the match query, which will correctly connect analyzers between your query and the field mapping.

    Elasticsearch makes it very easy to compare different analysis settings, by the way. Compare:

    http://localhost:9200/_analyze?text=Monkies&analyzer=standard
    

    vs

    http://localhost:9200/_analyze?text=Monkies&analyzer=snowball