elasticsearchsearchautocompleten-gramshingles

Elasticsearch processor for shingles similar to split?


Is there a processor that will do shingles or can I make a custom one somehow?

In the pipeline processor below, I split on the space character, but I'd also like to combine words like a shingle analyzer would:

PUT _ingest/pipeline/split
{
  "processors": [
    {
      "split": {
        "field": "title",
        "target_field": "title_suggest.input",
        "separator": "\\s+"
      }
    }
  ]
}

Example:

"Senior Business Developer" needs a suggestion field with these terms.

  1. Senior Business Developer
  2. Business Developer
  3. Developer

Here are the links to the article and answer that inspired this question:

  1. https://blog.mimacom.com/autocomplete-elasticsearch-part3/
  2. How to combine completion, suggestion and match phrase across multiple text fields?

Solution

  • Here is one solution I came up with using a custom script:

    PUT _ingest/pipeline/shingle
    {
      "description" : "Create basic shingles from title field and input in another field title_suggest",
      "processors" : [
        {
          "script": {
            "lang": "painless",
            "source": """
                  String[] split(String s, char d) {                                   
                    int count = 0;
                
                    for (char c : s.toCharArray()) {                                 
                        if (c == d) {
                            ++count;
                        }
                    }
                
                    if (count == 0) {
                        return new String[] {s};                                     
                    }
                
                    String[] r = new String[count + 1];                              
                    int i0 = 0, i1 = 0;
                    count = 0;
                
                    for (char c : s.toCharArray()) {                                 
                        if (c == d) {
                            r[count++] = s.substring(i0, i1);
                            i0 = i1 + 1;
                        }
                
                        ++i1;
                    }
                
                    r[count] = s.substring(i0, i1);                                  
                
                    return r;
                  }
                  
                  if (!ctx.containsKey('title')) { return; }
                  def title_words = split(ctx['title'], (char)' ');
                  def title_suggest = [];
                  for (def i = 0; i < title_words.length; i++) {
                    def shingle = title_words[i];
                    title_suggest.add(shingle);
                    for (def j = i + 1; j < title_words.length; j++) {
                      shingle = shingle + ' ' + title_words[j];
                      title_suggest.add(shingle);
                    }
                  }
                  ctx['title_suggest'] = title_suggest;
                  
                """
          }
        }
      ]
    }