elasticsearchelasticsearch-rails

Incorrect Results With Punctuation - ElasticSearch


I have a name field that is indexed using the english analyzer that contains part names (also tried the standard analyzer).

The problem I have is some of my titles contain punctuation, some do not. Also, some of my queries contain punctuation and some do not.

For example I have the title "CenterG 5.2 Drive Belt for model number 4425". My query could look like this: "Centerg 5.2 belt" and if it does, then my results display correctly with the "CenterG 5.2 Drive Belt for model number 4425" at the top.

However, if my query does not contain punctuation, the product does not display in the results. I have the same problem for titles that don't contain punctuation and queries that do. I'm not sure how this should be handled. I tried using the standard analyzer which I understand disregards punctuation, but that did not improve the results. They were roughly the same.

So, when I search for "CenterG 5.2 Belt" or "centerg 52 belt", I want the product "CenterG 5.2 Drive belt for model number 4425" to display at the top of my results.

Here is my mapping:

{:properties=>{:name=>{:type=>"text", :analyzer=>"english"}}

I have also tried leveraging an ngram analyzer which did not fix this problem.

Here is my query:

       {
            query: {
                bool: {
                    should: 
                       {
                            multi_match:{
                                fields: ["name"],
                                query: "#{query}"
                            }
                        }
                 }
              }

        }

Solution

  • This is difficult to achieve with just 1 field and 1 analyzer. first part of your example is easy to achieve if you just use a custom analyzer which removes all the dots . with empty space, both at index time and query time.

    But in your comment, you mentioned that you want to search document containing PFT11473.1 with search query PFT11473, for which you need to create another analyzer which would replace . with space , so that 2 tokens are generated PFT11473 and 1 and anyone would be searchable.

    I created 2 fields for storing your title field using 2 different analyzers which serves both the use-cases you mentioned.

    Below is the index mapping:

    {
        "settings": {
            "analysis": {
                "analyzer": {
                    "my_analyzer": {
                        "tokenizer": "standard",
                        "char_filter": [
                            "replace_dots"
                        ]
                    },
                    "space_analyzer": {
                        "tokenizer": "standard",
                        "char_filter": [
                            "replace_dots_space"
                        ]
                    }
                },
                "char_filter": {
                    "replace_dots": {
                        "type": "mapping",
                        "mappings": [
                            ". =>"
                        ]
                    },
                    "replace_dots_space": {
                        "type": "mapping",
                        "mappings": [
                            ". => \\u0020"
                        ]
                    }
                }
            }
        },
        "mappings": {
            "properties": {
                "title": {
                    "analyzer": "my_analyzer",
                    "type": "text"
                },
                "title_space": {
                    "analyzer": "space_analyzer",
                    "type": "text"
                }
            }
        }
    }
    

    And this is how I indexed one example doc:

    {
      "title" : "PFT11473.1",
      "title_space": "PFT11473.1"
    }
    

    And final search query:

    {
        "query": {
            "multi_match": {
                "query": "PFT11473.1",
                "fields": [
                    "title",
                    "title_space"
                ]
            }
        }
    }