searchelasticsearchstem

Elasticsearch match with stemming


How do I do a search for a stemmed match?

I.e. at the moment I have many documents that contain the word "skateboard" in the item_title field, but only 3 documents that contain the word "skateboards". Because of this, when I do the following search:

POST /my_index/my_type/_search
{
    "size": 100,
    "query" : {
        "multi_match": {
           "query": "skateboards",
           "fields": [ "item_title^3" ]
        }
    }
}

I only get 3 results. However, I would like also documents with the word "skateboard" to be returned.

From what I understand from Elasticsearch I would expect that this is done by specifying a mapping on the item_title field that contains an analyser which indexes the stemmed version of each word, but I can't seem to find the documentation on how to do this, which suggests that it's done in a different way.

Suggestions?


Solution

  • Here's one example:

    PUT /stem
    {
      "settings": {
        "analysis": {
          "filter": {
            "filter_stemmer": {
              "type": "stemmer",
              "language": "english"
            }
          },
          "analyzer": {
            "tags_analyzer": {
              "type": "custom",
              "filter": [
                "standard",
                "lowercase",
                "filter_stemmer"
              ],
              "tokenizer": "standard"
            }
          }
        }
      },
      "mappings": {
        "test": {
          "properties": {
            "item_title": {
              "analyzer": "tags_analyzer",
              "type": "text"
            }
          }
        }
      }
    }
    

    Index some sample docs:

    POST /stem/test/1
    {
      "item_title": "skateboards"
    }
    POST /stem/test/2
    {
      "item_title": "skateboard"
    }
    POST /stem/test/3
    {
      "item_title": "skate"
    }
    

    Perform the query:

    GET /stem/test/_search
    {
      "query": {
        "multi_match": {
          "query": "skateboards",
          "fields": [
            "item_title^3"
          ]
        }
      },
      "fielddata_fields": [
        "item_title"
      ]
    }
    

    And see the results:

      "hits": [
         {
            "_index": "stem",
            "_type": "test",
            "_id": "1",
            "_score": 1,
            "_source": {
               "item_title": "skateboards"
            },
            "fields": {
               "item_title": [
                  "skateboard"
               ]
            }
         },
         {
            "_index": "stem",
            "_type": "test",
            "_id": "2",
            "_score": 1,
            "_source": {
               "item_title": "skateboard"
            },
            "fields": {
               "item_title": [
                  "skateboard"
               ]
            }
         }
      ]
    

    I have added, also, the fielddata_fields element so that you can see how the content of the field has been indexed. As you can see, in both cases, the indexed term is skateboard.