elasticsearchelasticsearch-analyzers

Elasticsearch - How to specify the same analyzer for search and index


I'm working on a Spanish search engine. (I don't speak Spanish) But based on my research, the goal is more or less like this: 1. filter stopwords like "dos","de","la"... 2. stem the words for both search and index. e.g If you search "primera", then "primero","primer" should also show up.

My attempt:

es_analyzer={
        "settings": {
            "analysis": {
            "filter": {
                "spanish_stop": {
                "type":       "stop",
                "stopwords":  "_spanish_" 
                },
                "spanish_stemmer": {
                "type":       "stemmer",
                "language":   "spanish"
                }
            },
            "analyzer": {
                "default_search": {
                    "type": "spanish"
                },
                "rebuilt_spanish": {
                "tokenizer":  "standard",
                "filter": [
                    "lowercase",
                    "spanish_stop",
                    "spanish_stemmer"
                ]
                }
            }
            }
        }
    }

The problem: When I use "type":"spanish" in the "default_search", my query "primera" gets stemmed to "primer", which is correct, but even though I specified to use "spanish_stemmer" in the filter, the documents in the index aren't stemmed. So as a result when I search for "primera", it only shows exact matches for "primer". Any suggestions on fixing this?

Potential fix but I haven't figured out the syntax:

  1. Using built-in "spanish" analyzer in filter. What's the syntax?
  2. Adding spanish stemmer and stopwords in "default_search". But I don't know how to use compound settings there.

Solution

  • Adding a working example with index data, mapping, search query, and search result

    Index Mapping:

     {
      "settings": {
        "analysis": {
          "filter": {
            "spanish_stop": {
              "type": "stop",
              "stopwords": "_spanish_"
            },
            "spanish_stemmer": {
              "type": "stemmer",
              "language": "spanish"
            }
          },
          "analyzer": {
            "default_search": {
              "type":"spanish",
              "tokenizer": "standard",
              "filter": [
                "lowercase",
                "spanish_stop",
                "spanish_stemmer"
              ]
            }
          }
        }
      },
      "mappings":{
        "properties":{
          "title":{
            "type":"text",
            "analyzer":"default_search"
          }
        }
      }
    }
    

    Index Data:

    {
      "title": "primer"
    }
    {
      "title": "primera"
    }
    {
      "title": "primero"
    }
    

    Search Query:

    {
      "query":{
        "match":{
          "title":"primer"
        }
      }
    }
    

    Search Result:

    "hits": [
          {
            "_index": "stof_64420517",
            "_type": "_doc",
            "_id": "3",
            "_score": 0.13353139,
            "_source": {
              "title": "primer"
            }
          },
          {
            "_index": "stof_64420517",
            "_type": "_doc",
            "_id": "1",
            "_score": 0.13353139,
            "_source": {
              "title": "primera"
            }
          },
          {
            "_index": "stof_64420517",
            "_type": "_doc",
            "_id": "2",
            "_score": 0.13353139,
            "_source": {
              "title": "primero"
            }
          }
        ]