This is the official list of built-in analyzers: https://www.elastic.co/guide/en/elasticsearch/reference/7.13/analysis-analyzers.html
So what surprised me, that when I used the "classic" for analyzer, it worked, and the result is actually what I want to use:
POST /_analyze
{
"analyzer": "classic",
"text": "this is a test 123-456-789"
}
And gives: [ test, 123-456-789 ]
I don't know what this classic analyzer is, but it fits my purpose! I want to read up on it, to get the details, but there is no info!
There is "classic" tokenizer:
POST /_analyze
{
"tokenizer": "classic",
"text": "this is a test 123-456-789"
}
However, the result is different: [ this, is, a, test, 123-456-789 ]
So the question is, anyone knows what this "classic" analyzer is? And I guess in general, how does one check any built-in analyzer settings in elastic search?
The ClassicAnalyzer
is a native Lucene analyzer which is composed of:
classic
tokenizerclassic
token filterlowercase
token filterstop
token filter with a fixed set of english stop wordsSo the second test you're making is not complete as it's missing the token filters, it should be like this:
POST /_analyze
{
"tokenizer": "classic",
"filter": [
"classic",
"lowercase",
"stop"
],
"text": "this is a test 123-456-789"
}
And this yields the same tokens as with the classic
analyzer, i.e. [ test, 123-456-789 ]