elasticsearchelasticsearch-dslelasticsearch-pyelasticsearch-dsl-py

Partial text-formatted date match with Elasticsearch regexp


I formatted a date field dob as text dob.strftime("%m/%d/%Y") and stored these dates on Elasticsearch 8.7.1 ("lucene_version": "9.5.0") so I could utilize regexp to do partial date matching.

Suppose this date is stored on Elasticsearch: 06/01/2023, I noticed that I was only able to get this result back when using these regexp queries:

However, using / or \/ or \\/ in the query couldn't get back any result. Double checked on Elasticsearch's doc, / is NOT a reserved character.

I have a few questions, help would be much appreciated:

  1. Why is / not working as a part of the regexp query?
  2. How to correctly formulate the regexp query? I wish I could find matches after typing a term that's matching any of the following format:
- M/d
- M/d/YY
- M/d/YYYY
- M/dd
- M/dd/YY
- M/dd/YYYY
- M/YY
- M/YYYY
- MM/d
- MM/d/YY
- MM/d/YYYY
- MM/dd
- MM/dd/YY
- MM/dd/YYYY
- MM/YY
- MM/YYYY
  1. Does any other types of search work better than regexp?

Solution

  • According to this official doc, / is indeed a reserved character. When using JSON for the request body, two preceding backslashes (\) are required since the backslash is a reserved escaping character in JSON strings.

    regexp query works differently for text and keyword fields. Elasticsearch analyzes fields before applying regex. Text fields are tokenized into individual words so using / couldn't find any match.

    Instead, the entire keyword field string is treated as a single and non-analyzed string (see Keyword analyzer). Searching with / and regexp worked after I used a keyword field instead:

    "mappings": {
      "properties": {
        "dob": {
          "type": "text",
          "fields": {
            "raw": {
              "type": "keyword"
            }
          }
        }
     }
    
    {
      "query": {
        "regexp": {
          "dob.raw": ".*6\\/.*2023.*"
        }
      }
    }