elasticsearchelasticsearch-6elasticsearch-6.8

Elasticsearch: Custom Token Filter


Since there is no documentation about the subject, it is very complicated to understand how to implement a custom token filter plugin from scratch in Java.

I'd like to get an analyzer filter that returns only tokens that are numbers for example.

Any idea?


Solution

  • There are existing filters that do this. For instance the keep_types token filter can do exactly that.

    If you leverage the <NUM> type, your custom token filter is going to only let numeric tokens through and filter out all others.

    GET _analyze
    {
      "tokenizer": "standard",
      "filter": [
        {
          "type": "keep_types",
          "types": [ "<NUM>" ]
        }
      ],
      "text": "1 quick fox 2 lazy dogs"
    }
    

    Result:

    [1, 2]
    

    You can achieve a similar result with the pattern_capture token filter as well.

    But if you really want to go the Java way, then you're best best is to clone an existing analysis plugin and roll your own.