I formatted a date field dob
as text dob.strftime("%m/%d/%Y")
and stored these dates on Elasticsearch 8.7.1 ("lucene_version": "9.5.0"
) so I could utilize regexp
to do partial date matching.
Suppose this date is stored on Elasticsearch: 06/01/2023
, I noticed that I was only able to get this result back when using these regexp queries:
06.*
.*01.*
.*2023
However, using /
or \/
or \\/
in the query couldn't get back any result. Double checked on Elasticsearch's doc, /
is NOT a reserved character.
I have a few questions, help would be much appreciated:
/
not working as a part of the regexp
query?regexp
query? I wish I could find matches after typing a term that's matching any of the following format:- M/d
- M/d/YY
- M/d/YYYY
- M/dd
- M/dd/YY
- M/dd/YYYY
- M/YY
- M/YYYY
- MM/d
- MM/d/YY
- MM/d/YYYY
- MM/dd
- MM/dd/YY
- MM/dd/YYYY
- MM/YY
- MM/YYYY
regexp
?According to this official doc, /
is indeed a reserved character. When using JSON for the request body, two preceding backslashes (\) are required since the backslash is a reserved escaping character in JSON strings.
regexp
query works differently for text and keyword fields. Elasticsearch analyzes fields before applying regex. Text fields are tokenized into individual words so using /
couldn't find any match.
Instead, the entire keyword field string is treated as a single and non-analyzed string (see Keyword analyzer). Searching with /
and regexp worked after I used a keyword field instead:
dob = fields.TextField(fields={"raw": fields.KeywordField()})
"mappings": {
"properties": {
"dob": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
{
"query": {
"regexp": {
"dob.raw": ".*6\\/.*2023.*"
}
}
}