Since there is no documentation about the subject, it is very complicated to understand how to implement a custom token filter plugin from scratch in Java.
I'd like to get an analyzer filter that returns only tokens that are numbers for example.
Any idea?
There are existing filters that do this. For instance the keep_types
token filter can do exactly that.
If you leverage the <NUM>
type, your custom token filter is going to only let numeric tokens through and filter out all others.
GET _analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "keep_types",
"types": [ "<NUM>" ]
}
],
"text": "1 quick fox 2 lazy dogs"
}
Result:
[1, 2]
You can achieve a similar result with the pattern_capture
token filter as well.
But if you really want to go the Java way, then you're best best is to clone an existing analysis plugin and roll your own.