Is there a processor that will do shingles or can I make a custom one somehow?
In the pipeline processor below, I split on the space character, but I'd also like to combine words like a shingle analyzer would:
PUT _ingest/pipeline/split
{
"processors": [
{
"split": {
"field": "title",
"target_field": "title_suggest.input",
"separator": "\\s+"
}
}
]
}
Example:
"Senior Business Developer" needs a suggestion field with these terms.
Here are the links to the article and answer that inspired this question:
Here is one solution I came up with using a custom script:
PUT _ingest/pipeline/shingle
{
"description" : "Create basic shingles from title field and input in another field title_suggest",
"processors" : [
{
"script": {
"lang": "painless",
"source": """
String[] split(String s, char d) {
int count = 0;
for (char c : s.toCharArray()) {
if (c == d) {
++count;
}
}
if (count == 0) {
return new String[] {s};
}
String[] r = new String[count + 1];
int i0 = 0, i1 = 0;
count = 0;
for (char c : s.toCharArray()) {
if (c == d) {
r[count++] = s.substring(i0, i1);
i0 = i1 + 1;
}
++i1;
}
r[count] = s.substring(i0, i1);
return r;
}
if (!ctx.containsKey('title')) { return; }
def title_words = split(ctx['title'], (char)' ');
def title_suggest = [];
for (def i = 0; i < title_words.length; i++) {
def shingle = title_words[i];
title_suggest.add(shingle);
for (def j = i + 1; j < title_words.length; j++) {
shingle = shingle + ' ' + title_words[j];
title_suggest.add(shingle);
}
}
ctx['title_suggest'] = title_suggest;
"""
}
}
]
}