elasticsearchelasticsearch-analyzers

Combining terms with synonyms - ElasticSearch


I am new to Elasticsearch and have a synonym analyzer in place which looks like-

{
    "settings": {  
        "index": {  
            "analysis": {  
                "filter": {  
                    "graph_synonyms": {  
                        "type": "synonym_graph",
                        "synonyms": [ 
                            "gowns, dresses",
                            "backpacks, bags", 
                            "coats, jackets"
                        ] 
                    }
                },
                "analyzer": {  
                    "search_time_analyzer": { 
                        "tokenizer": "standard", 
                        "filter": [ 
                            "lowercase",
                            "graph_synonyms" 
                        ] 
                    } 
                }
            }
        }
    }
}

And the mapping looks like-

{
    "properties": {
        "category": {  
            "type": "text",
            "search_analyzer": "search_time_analyzer",
            "fields": {
                "no_synonyms": {
                    "type": "text"
                }
            }
          }
    }
}

If I search for gowns, it gives me proper results for both gowns as well as dresses.

But the problem is if I search for red gowns, (the system does not have any red gowns) the expected behavior is to search for red dresses and return those results. But instead, it returns results of gowns and dresses irrespective of the color.

I would want to configure the system such that it considers both the terms and their respective synonyms if any and then return the results.

For reference, this is what my search query looks like-

"query": 
{
    "bool": 
    {
        should: 
        [
            {
                "multi_match":
                {
                    "boost": 300,
                    "query": term,
                    "type": "cross_fields",
                    "operator": "or",
                    "fields": ["bu.keyword^10", "bu^10", "category.keyword^8", "category^8", "category.no_synonyms^8", "brand.keyword^7", "brand^7", "colors.keyword^2", "colors^2", "size.keyword", "size", "hash.keyword^2", "hash^2", "name"]
                }
            }
        ]
    }
} 

Sample document:

_source: {
  productId: '12345',
  name: 'RUFFLE FLORAL TRIM COTTON MAXI DRESS',
  brand: [ 'self-portrait' ],
  mainImage: 'http://test.jpg',
  description: 'Self-portrait presents this maxi dress, crafted from cotton, to offer your off-duty ensembles an elegant update. Trimmed with ruffled broderie details, this piece is an effortless showcase of modern femininity.',
  status: 'active',
  bu: [ 'womenswear' ],
  category: [ 'dresses', 'gowns' ],
  tier1: [],
  tier2: [],
  colors: [ 'WHITE' ],
  size: [ '4', '6', '8', '10' ],
  hash: [
    'ballgown',   'cotton',
    'effortless', 'elegant',
    'floral',     'jar',
    'maxi',       'modern',
    'off-duty',   'ruffle',
    'ruffled',    '1',
    '2',          'crafted'
  ],
  styleCode: '211274856'
}

How can I achieve the desired output? Any help would be appreciated. Thanks


Solution

  • You can configured index time analyzer insted of search time analyzer like below:

    {
        "properties": {
            "category": {  
                "type": "text",
                "analyzer": "search_time_analyzer",
                "fields": {
                    "no_synonyms": {
                        "type": "text"
                    }
                }
              }
        }
    }
    

    Once you done with index mapping change, reindex your data and try below query: Please note that I have changed operator to and and analyzer to standard:

    {
      "query": {
        "multi_match": {
          "boost": 300,
          "query": "gowns red",
          "analyzer": "standard", 
          "type": "cross_fields",
          "operator": "and",
          "fields": [
            "category",
            "colors"
          ]
        }
      }
    }
    

    Why your current query is not working:

    Inexing: Your current index mapping indexing data with standard analyzer so it will not index any of your category with synonyms values.

    Searching: Your current query have operator or so if you search for red gowns then it will create query like red OR gowns OR dresses and it will giving you result irrespective of the color. Also, if you change operator to and in existing configuration then it will return zero result as it will create query like red AND gowns AND dresses.

    Solution: Once you done changes as i suggsted it will index synonyms for category field as well and it will work with and operator. So if you try query gowns red then it will create query like gowns AND red. It will match because category field have both values gowns and dresses due to synonyms applied at index time.