according to the docs this should not be possible
Regular expressions cannot be anchored to the beginning or end of a token
nevertheless it seems to work for me
GET /_analyze
{
"tokenizer": "whitespace",
"filter": [
{
"type": "pattern_replace",
"pattern": "(dog)$",
"replacement": "hot$1"
}
],
"text": "dog dogs"
}
returns
{
"tokens" : [
{
"token" : "hotdog",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 0
},
{
"token" : "dogs",
"start_offset" : 4,
"end_offset" : 8,
"type" : "word",
"position" : 1
}
]
}
Note that the pattern is anchored to the end of the token and "dogs" is not replaced because it doesn't end with "dog".
So my question is: Am I missing something or am I safe to use it (and the docs are just wrong)?
Looks like it's the wrong documentation, and Elasticsearch bug for this, have looked at the elastic code, and there is no special handling of the beginning or end of the token.
Please refer to this ES code which is used for this token filter, it calls the Lucene token filter, and both at Elastic and Lucene code level there is no special handling.