elasticsearchelasticsearch-analyzers

Weird built-in "classic" analyzer?


This is the official list of built-in analyzers: https://www.elastic.co/guide/en/elasticsearch/reference/7.13/analysis-analyzers.html

So what surprised me, that when I used the "classic" for analyzer, it worked, and the result is actually what I want to use:

POST /_analyze
{
 "analyzer": "classic",
 "text": "this is a test 123-456-789"
}

And gives: [ test, 123-456-789 ]

I don't know what this classic analyzer is, but it fits my purpose! I want to read up on it, to get the details, but there is no info!

There is "classic" tokenizer:

POST /_analyze
{
 "tokenizer": "classic",
 "text": "this is a test 123-456-789"
}

However, the result is different: [ this, is, a, test, 123-456-789 ]

So the question is, anyone knows what this "classic" analyzer is? And I guess in general, how does one check any built-in analyzer settings in elastic search?


Solution

  • The ClassicAnalyzer is a native Lucene analyzer which is composed of:

    So the second test you're making is not complete as it's missing the token filters, it should be like this:

    POST /_analyze
    {
      "tokenizer": "classic",
      "filter": [
        "classic",
        "lowercase",
        "stop"
      ],
      "text": "this is a test 123-456-789"
    }
    

    And this yields the same tokens as with the classic analyzer, i.e. [ test, 123-456-789 ]