Jest provides a brilliant async API for elasticsearch, we find it very usefull. However, sometimes it turns out that resulting requests are slightly different than what we would expect.
Usually we didn't care, since everything was working fine, but in this case it was not.
I want to create an index with a custom ngram analyzer. When I do this following the elasticsearch rest API docs, I call below:
curl -XPUT 'localhost:9200/test' --data '
{
"settings": {
"number_of_shards": 3,
"analysis": {
"filter": {
"keyword_search": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 15
}
},
"analyzer": {
"keyword": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"keyword_search"
]
}
}
}
}
}'
and then I confirm the analyzer is configured properly using:
curl -XGET 'localhost:9200/test/_analyze?analyzer=keyword&text=Expecting many tokens
in response I receive multiple tokens like exp, expe, expec and so on.
Now using Jest client I put the config json to a file on my classpath, the content is exactly the same as the body of the PUT request above. I execute the Jest action constructed like this:
new CreateIndex.Builder(name)
.settings(
ImmutableSettings.builder()
.loadFromClasspath(
"settings.json"
).build().getAsMap()
).build();
In result
Primo - checked with tcpdump that what's actually posted to elasticsearch is (pretty printed):
{
"settings.analysis.filter.keyword_search.max_gram": "15",
"settings.analysis.filter.keyword_search.min_gram": "3",
"settings.analysis.analyzer.keyword.tokenizer": "whitespace",
"settings.analysis.filter.keyword_search.type": "edge_ngram",
"settings.number_of_shards": "3",
"settings.analysis.analyzer.keyword.filter.0": "lowercase",
"settings.analysis.analyzer.keyword.filter.1": "keyword_search",
"settings.analysis.analyzer.keyword.type": "custom"
}
Secundo - the resulting index settings is:
{
"test": {
"settings": {
"index": {
"settings": {
"analysis": {
"filter": {
"keyword_search": {
"type": "edge_ngram",
"min_gram": "3",
"max_gram": "15"
}
},
"analyzer": {
"keyword": {
"filter": [
"lowercase",
"keyword_search"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
},
"number_of_shards": "3" <-- the only difference from the one created with rest call
},
"number_of_shards": "3",
"number_of_replicas": "0",
"version": {"created": "1030499"},
"uuid": "Glqf6FMuTWG5EH2jarVRWA"
}
}
}
}
Tertio - checking the analyzer with curl -XGET 'localhost:9200/test/_analyze?analyzer=keyword&text=Expecting many tokens
I get just one token!
Question 1. What is the reason that Jest does not post my original settings json, but some processed one instead?
Question 2. Why the settings generated by Jest are not working?
Glad you found Jest useful, please see my answer below.
Question 1. What is the reason that Jest does not post my original settings json, but some processed one instead?
It's not Jest but the Elasticsearch's ImmutableSettings
doing that, see:
Map test = ImmutableSettings.builder()
.loadFromSource("{\n" +
" \"settings\": {\n" +
" \"number_of_shards\": 3,\n" +
" \"analysis\": {\n" +
" \"filter\": {\n" +
" \"keyword_search\": {\n" +
" \"type\": \"edge_ngram\",\n" +
" \"min_gram\": 3,\n" +
" \"max_gram\": 15\n" +
" }\n" +
" },\n" +
" \"analyzer\": {\n" +
" \"keyword\": {\n" +
" \"type\": \"custom\",\n" +
" \"tokenizer\": \"whitespace\",\n" +
" \"filter\": [\n" +
" \"lowercase\",\n" +
" \"keyword_search\"\n" +
" ]\n" +
" }\n" +
" }\n" +
" }\n" +
" }\n" +
"}").build().getAsMap();
System.out.println("test = " + test);
outputs:
test = {
settings.analysis.filter.keyword_search.type=edge_ngram,
settings.number_of_shards=3,
settings.analysis.analyzer.keyword.filter.0=lowercase,
settings.analysis.analyzer.keyword.filter.1=keyword_search,
settings.analysis.analyzer.keyword.type=custom,
settings.analysis.analyzer.keyword.tokenizer=whitespace,
settings.analysis.filter.keyword_search.max_gram=15,
settings.analysis.filter.keyword_search.min_gram=3
}
Question 2. Why the settings generated by Jest are not working?
Because your usage of settings JSON/map is not the intended case. I have created this test to reproduce your case (it's a bit long but bear with me):
@Test
public void createIndexTemp() throws IOException {
String index = "so_q_26949195";
String settingsAsString = "{\n" +
" \"settings\": {\n" +
" \"number_of_shards\": 3,\n" +
" \"analysis\": {\n" +
" \"filter\": {\n" +
" \"keyword_search\": {\n" +
" \"type\": \"edge_ngram\",\n" +
" \"min_gram\": 3,\n" +
" \"max_gram\": 15\n" +
" }\n" +
" },\n" +
" \"analyzer\": {\n" +
" \"keyword\": {\n" +
" \"type\": \"custom\",\n" +
" \"tokenizer\": \"whitespace\",\n" +
" \"filter\": [\n" +
" \"lowercase\",\n" +
" \"keyword_search\"\n" +
" ]\n" +
" }\n" +
" }\n" +
" }\n" +
" }\n" +
"}";
Map settingsAsMap = ImmutableSettings.builder()
.loadFromSource(settingsAsString).build().getAsMap();
CreateIndex createIndex = new CreateIndex.Builder(index)
.settings(settingsAsString)
.build();
JestResult result = client.execute(createIndex);
assertTrue(result.getErrorMessage(), result.isSucceeded());
GetSettings getSettings = new GetSettings.Builder().addIndex(index).build();
result = client.execute(getSettings);
assertTrue(result.getErrorMessage(), result.isSucceeded());
System.out.println("SETTINGS SENT AS STRING settingsResponse = " + result.getJsonString());
Analyze analyze = new Analyze.Builder()
.index(index)
.analyzer("keyword")
.source("Expecting many tokens")
.build();
result = client.execute(analyze);
assertTrue(result.getErrorMessage(), result.isSucceeded());
Integer actualTokens = result.getJsonObject().getAsJsonArray("tokens").size();
assertTrue("Expected multiple tokens but got " + actualTokens, actualTokens > 1);
analyze = new Analyze.Builder()
.analyzer("keyword")
.source("Expecting single token")
.build();
result = client.execute(analyze);
assertTrue(result.getErrorMessage(), result.isSucceeded());
actualTokens = result.getJsonObject().getAsJsonArray("tokens").size();
assertTrue("Expected single token but got " + actualTokens, actualTokens == 1);
admin().indices().delete(new DeleteIndexRequest(index)).actionGet();
createIndex = new CreateIndex.Builder(index)
.settings(settingsAsMap)
.build();
result = client.execute(createIndex);
assertTrue(result.getErrorMessage(), result.isSucceeded());
getSettings = new GetSettings.Builder().addIndex(index).build();
result = client.execute(getSettings);
assertTrue(result.getErrorMessage(), result.isSucceeded());
System.out.println("SETTINGS AS MAP settingsResponse = " + result.getJsonString());
analyze = new Analyze.Builder()
.index(index)
.analyzer("keyword")
.source("Expecting many tokens")
.build();
result = client.execute(analyze);
assertTrue(result.getErrorMessage(), result.isSucceeded());
actualTokens = result.getJsonObject().getAsJsonArray("tokens").size();
assertTrue("Expected multiple tokens but got " + actualTokens, actualTokens > 1);
}
When you run it you'll see that the case where settingsAsMap
is used the actual settings is totally wrong (settings
includes another settings
which is your JSON but they should have been merged) and so the analyze fails.
Why is this not the intended usage?
Simply because that's how Elasticsearch behaves in this situation. If the settings data is flattened (as it is done by default by the ImmutableSettings
class) then it should not have the top level element settings
but it can have the same top level element if data is not flattened (and that's why the test case with settingsAsString
works).
tl;dr:
Your settings JSON should not include the top level "settings" element (if you run it through ImmutableSettings
).