I'm working on a Spanish search engine. (I don't speak Spanish) But based on my research, the goal is more or less like this: 1. filter stopwords like "dos","de","la"... 2. stem the words for both search and index. e.g If you search "primera", then "primero","primer" should also show up.
My attempt:
es_analyzer={
"settings": {
"analysis": {
"filter": {
"spanish_stop": {
"type": "stop",
"stopwords": "_spanish_"
},
"spanish_stemmer": {
"type": "stemmer",
"language": "spanish"
}
},
"analyzer": {
"default_search": {
"type": "spanish"
},
"rebuilt_spanish": {
"tokenizer": "standard",
"filter": [
"lowercase",
"spanish_stop",
"spanish_stemmer"
]
}
}
}
}
}
The problem:
When I use "type":"spanish"
in the "default_search"
, my query "primera" gets stemmed to "primer", which is correct, but even though I specified to use "spanish_stemmer"
in the filter, the documents in the index aren't stemmed. So as a result when I search for "primera", it only shows exact matches for "primer". Any suggestions on fixing this?
Potential fix but I haven't figured out the syntax:
"spanish"
analyzer in filter. What's the syntax?"default_search"
. But I don't know how to use compound settings there.Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"settings": {
"analysis": {
"filter": {
"spanish_stop": {
"type": "stop",
"stopwords": "_spanish_"
},
"spanish_stemmer": {
"type": "stemmer",
"language": "spanish"
}
},
"analyzer": {
"default_search": {
"type":"spanish",
"tokenizer": "standard",
"filter": [
"lowercase",
"spanish_stop",
"spanish_stemmer"
]
}
}
}
},
"mappings":{
"properties":{
"title":{
"type":"text",
"analyzer":"default_search"
}
}
}
}
Index Data:
{
"title": "primer"
}
{
"title": "primera"
}
{
"title": "primero"
}
Search Query:
{
"query":{
"match":{
"title":"primer"
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64420517",
"_type": "_doc",
"_id": "3",
"_score": 0.13353139,
"_source": {
"title": "primer"
}
},
{
"_index": "stof_64420517",
"_type": "_doc",
"_id": "1",
"_score": 0.13353139,
"_source": {
"title": "primera"
}
},
{
"_index": "stof_64420517",
"_type": "_doc",
"_id": "2",
"_score": 0.13353139,
"_source": {
"title": "primero"
}
}
]