I've added an english stemmer analyzer and filter to our query but it doesn't seem to be working correctly with plurals stemming from 'y' => 'ies'. For example, when I search 'raspberry' the results never include 'raspberries' and so on. I've tried both english and minimal_english but I still get the same result.
Here's the analyzer and settings:
analysis: {
analyzer: {
custom_analyzer: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase", "english_stemmer"],
},
},
filter: {
english_stemmer: {
type: "stemmer",
language: "english",
},
},
},
}
What am I doing wrong?
Though english
should work for the e.g. you mentioned, you can even go for porter_stem instead. This is equivalent to stemmer with language english.
porter_stem in action:
POST /_analyze
{
"tokenizer": "standard",
"filter": ["porter_stem"],
"text": ["raspberry", "raspberries"]
}
Response of above request:
{
"tokens" : [
{
"token" : "raspberri",
"start_offset" : 0,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "raspberri",
"start_offset" : 10,
"end_offset" : 21,
"type" : "<ALPHANUM>",
"position" : 101
}
]
}
You can see both raspberry
and raspberries
get tokenise to raspberri
. Therefore searching for raspberry
will also match raspberries
and vice-versa.
Make sure that the field against which you are indexing and searching has defined the analyzer as custom_analyzer
(according to settings you stated in your question).
Mapping:
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"english_stemmer"
]
}
},
"filter": {
"english_stemmer": {
"type": "stemmer",
"language": "english"
}
}
}
},
"mappings": {
"properties": {
"field1": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
Indexing:
PUT test/_doc/1
{
"field1": "raspberries"
}
PUT test/_doc/2
{
"field1": "raspberry"
}
Search:
GET test/_search
{
"query": {
"match": {
"field1": {
"query": "raspberry"
}
}
}
}
Response:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.18232156,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.18232156,
"_source" : {
"field1" : "raspberries"
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.18232156,
"_source" : {
"field1" : "raspberry"
}
}
]
}
}
You can also have a look at other stemmer kstem.