I can't seem to figure out how to get elasticsearch (accessed via pyes) to search plural/singular terms. For instance, when I enter Monkies, I'd like to get results back that have Belt. I've looked at Elasticsearch not returning singular/plural matches but can't seem to make sense of it. Here's some curl statements
curl -XDELETE localhost:9200/myindex
curl -XPOST localhost:9200/myindex -d '
{"index":
{ "number_of_shards": 1,
"analysis": {
"filter": {
"myfilter": {
"type" : "porter_stem",
"language" : "English"
}
},
"analyzer": {
"default" : {
"tokenizer" : "nGram",
"filter" : ["lowercase", "myfilter"]
},
"index_analyzer" : {
"tokenizer" : "nGram",
"filter" : ["lowercase", "myfilter"]
},
"search_analyzer" : {
"tokenizer" : "nGram",
"filter" : ["lowercase", "myfilter"]
}
}
}
}
}
}'
curl -XPUT localhost:9200/myindex/mytype/_mapping -d '{
"tweet" : {
"date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy"],
"properties" : {
"user": {"type":"string"},
"post_date": {"type": "date"},
"message" : {"type" : "string", "analyzer": "search_analyzer"}
}
}}'
curl -XPUT 'http://localhost:9200/myindex/mytype/1' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "belt knife is a cool thing"
}'
curl -XPUT 'http://localhost:9200/myindex/mytype/2' -d '{
"user" : "alwild",
"post_date" : "2009-11-15T14:12:12",
"message" : "second message with nothing else"
}'
curl -XGET localhost:9200/myindex/mytype/_search?q=message:belts
I've got it to the point where searching for belts give me some results...but now it gives too many results. What do I have to do to get it to return only that one entry that has "belt" in it?
By default, your query is executed against the _all
field, which uses the standard analyzer, and thus you have no stemming. Try searching with a query such as name:Monkies
. For production purposes, use the match
query, which will correctly connect analyzers between your query and the field mapping.
Elasticsearch makes it very easy to compare different analysis settings, by the way. Compare:
http://localhost:9200/_analyze?text=Monkies&analyzer=standard
vs
http://localhost:9200/_analyze?text=Monkies&analyzer=snowball