node.jselasticsearchsearchfull-text-searchelastic-cloud

how to match a related data if incorrectly texted a keyword in elastic search


I have a document contain title with "Hard work & Success". I need to do a search for this document. And if I typed "Hardwork" (without spacing) it didn't returning any value. but if I typed "hard work" then it is returning the document.

this is the query I have used :

const search = qObject.search;
const payload = {
  from: skip,
  size: limit,
  _source: [
    "id",
    "title",
    "thumbnailUrl",
    "youtubeUrl",
    "speaker",
    "standards",
    "topics",
    "schoolDetails",
    "uploadTime",
    "schoolName",
    "description",
    "studentDetails",
    "studentId"
  ],
  query: {
    bool: {
      must: {
        multi_match: {
          fields: [
            "title^2",
            "standards.standard^2",
            "speaker^2",
            "schoolDetails.schoolName^2",
            "hashtags^2",
            "topics.topic^2",
            "studentDetails.studentName^2",
          ],
          query: search,
          fuzziness: "AUTO",
        },
      },
    },
  },
};

if I searched for title "hard work" (included space) then it returns data like this:

"searchResults": [
        {
            "_id": "92",
            "_score": 19.04531,
            "_source": {
                "standards": {
                    "standard": "3",
                    "categoryType": "STANDARD",
                    "categoryId": "S3"
                },
                "schoolDetails": {
                    "categoryType": "SCHOOL",
                    "schoolId": "TPS123",
                    "schoolType": "PUBLIC",
                    "logo": "91748922mn8bo9krcx71.png",
                    "schoolName": "Carmel CMI Public School"
                },
                "studentDetails": {
                    "studentId": 270,
                    "studentDp": "164646972124244.jpg",
                    "studentName": "Nelvin",
                    "about": "good student"
                },
                "topics": {
                    "categoryType": "TOPIC",
                    "topic": "Motivation",
                    "categoryId": "MY"
                },
                "youtubeUrl": "https://www.youtube.com/watch?v=wermQ",
                "speaker": "Anna Maria Siby",
                "description": "How hardwork leads to success - motivational talk by Anna",
                "id": 92,
                "uploadTime": "2022-03-17T10:59:59.400Z",
                "title": "Hard work & Success",
            }
        },
]

And if i search for the Keyword "Hardwork" (without spacing) it won't detecting this data. I need to make a space in it or I need to match related datas with the searching keyword. Is there any solution for this can you please help me out of this.


Solution

  • I made an example using a shingle analyzer.

    Mapping:

        {
      "settings": {
        "analysis": {
          "filter": {
            "shingle_filter": {
              "type": "shingle",
              "max_shingle_size": 4,
              "min_shingle_size": 2,
              "output_unigrams": "true",
              "token_separator": ""
            }
          },
          "analyzer": {
            "shingle_analyzer": {
              "type": "custom",
              "tokenizer": "standard",
              "filter": [
                "lowercase",
                "shingle_filter"
              ]
            }
          }
        }
      },
      "mappings": {
            "properties": {
          "title": {
            "type": "text",
            "analyzer": "shingle_analyzer"
          }
        }
      }
    }
    

    Now I tested it with your term. Note that the token "hardwork" was generated but the others were also generated which may be a problem for you.

    GET idx-separator-words/_analyze
    {
      "analyzer": "shingle_analyzer",
      "text": ["Hard work & Success"]
    }
    

    Results:

    {
      "tokens" : [
        {
          "token" : "hard",
          "start_offset" : 0,
          "end_offset" : 4,
          "type" : "<ALPHANUM>",
          "position" : 0
        },
        {
          "token" : "hardwork",
          "start_offset" : 0,
          "end_offset" : 9,
          "type" : "shingle",
          "position" : 0,
          "positionLength" : 2
        },
        {
          "token" : "hardworksuccess",
          "start_offset" : 0,
          "end_offset" : 19,
          "type" : "shingle",
          "position" : 0,
          "positionLength" : 3
        },
        {
          "token" : "work",
          "start_offset" : 5,
          "end_offset" : 9,
          "type" : "<ALPHANUM>",
          "position" : 1
        },
        {
          "token" : "worksuccess",
          "start_offset" : 5,
          "end_offset" : 19,
          "type" : "shingle",
          "position" : 1,
          "positionLength" : 2
        },
        {
          "token" : "success",
          "start_offset" : 12,
          "end_offset" : 19,
          "type" : "<ALPHANUM>",
          "position" : 2
        }
      ]
    }