elasticsearchlucenesolar

Elasticsearch, exact documant match


I want to perform an exact match on elasticsearch but the exact match should be on document not on the search string. for example:

I created this index:

PUT /indexName
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      }
    }
  }
}

POST /indexName/_doc
{
  "name": "City"
}

POST /indexName/_doc
{
  "name": "City Lab"
}

now I want to perform search like this:

-if I search for("City"), I want to get the first document only, and the second one should not match because ("City") didn't mathc the whole ("City Lab").

-if I search for ("City Lab"), I want to get the second document and the first documnet becuase it it matches both of them as full exact match.

-if I search for ("where is the Citly Lab"), we have ("City") and we have ("City Lab"), so I want to get also both of them.

-if I search for ("lab"), I will get 0 hit becasue lab didn't match any.

-if I search for ("I am inside the city, then I will go to the lab"), I should get first document only only, ("city") is a exact match, but ("I am inside the city, then I will go to the lab") ("city ... lab"), ("Lab") wasn't dircetly after ("City") so it is not a match.

How to do that in elasticsearch?


Solution

  • This requires a specific approach. We need to use different analyzers for searching and indexing this field. The index analyzer will remove extra spaces and punctuation marks, convert the text to lowercase, and index it as a single term. Meanwhile, the search analyzer will handle space and punctuation mark processing as well, but it will also generate search terms for each individual word, pairs of words, triplets of words, and so on.

    So, during indexing "City Lab" will become a single token city lab and "City" will become city. At the same time during searching "City Lab" will become a query:

    city OR lab OR city lab

    This way it will match both indexed tokens city lab and city. At the same time search for just "lab" will generate a single token lab that will not match anything. Here is a complete example:

    DELETE test
    PUT test
    {
      "settings": {
        "max_shingle_diff": 4,
        "analysis": {
          "char_filter": {
            "whitespace_and_punct_to_single_space": {
              "type": "pattern_replace",
              "pattern": "[\\p{Punct}\\s]+",
              "replacement": " "
            }
          },
          "filter": {
            "name_shingles": {
              "type": "shingle",
              "min_shingle_size": 2,
              "max_shingle_size": 5,
              "output_unigrams": true
            }
          },
          "analyzer": {
            "name_index_analyzer": {
              "type": "custom",
              "char_filter": [
                "whitespace_and_punct_to_single_space"
              ],
              "tokenizer": "keyword",
              "filter": [
                "lowercase"
              ]
            },
            "name_search_analyzer": {
              "type": "custom",
              "char_filter": [
                "whitespace_and_punct_to_single_space"
              ],
              "tokenizer": "whitespace",
              "filter": [
                "lowercase",
                "name_shingles"
              ]
            }
          }
        }
      },
      "mappings": {
        "properties": {
          "name": {
            "type": "text",
            "analyzer": "name_index_analyzer",
            "search_analyzer": "name_search_analyzer"
          }
        }
      }
    }
    
    POST test/_bulk?refresh
    {"index": {"_id": 1}}
    {"name": "City"}
    {"index": {"_id": 2}}
    {"name": "City Lab"}
    
    GET test/_search
    {
      "query": {
        "match": {
          "name": "City"
        }
      }
    }
    
    GET test/_search
    {
      "query": {
        "match": {
          "name": "City Lab"
        }
      }
    }
    
    GET test/_search
    {
      "query": {
        "match": {
          "name": "Where is the City Lab?"
        }
      }
    }
    
    GET test/_search
    {
      "query": {
        "match": {
          "name": "lab"
        }
      }
    }
    
    GET test/_search
    {
      "query": {
        "match": {
          "name": "I am inside the city, then I will go to the lab"
        }
      }
    }