pythonpython-3.xelasticsearchelasticsearch-dsl-py

Filter data by day range in Elasticsearch using Python DSL


I wrote this below method to filter data for last 8 days

def method_one(query) -> Query:
    gte = (datetime.datetime.now() - datetime.timedelta(days=query)).date()
    lt = (datetime.datetime.now()).date()
    print(gte, lt)
    return Q(MultiMatch(
        query=filter("range", {"lastModifiedDate": {"gte": gte, "lt": lt}}
                        ),
        fields=['lastModifiedDate']
    ))

I want to filter data based on the lastModifiedDate field by forming an Elasticsearch Query object in Python. For example if I give /lastModifiedDate=8 (Rest API Call), it should return data by filtering for the last 8 days.


Solution

  • You don't need the datetime module to construct date queries in Elasticsearch -- you can use built-in date math:

    from json import dumps
    from elasticsearch_dsl.search import Search
    from elasticsearch_dsl.query import Q, MultiMatch
    
    
    def date_range_query(num_of_days):
        if not isinstance(num_of_days, int):
            raise Exception(
                'expected numeric & positive `num_of_days`, got `%s`' % str(num_of_days))
    
        return Q(
            "range",
            lastModifiedDate={
                "gte": "now-%dd" % num_of_days,
                "lt": "now"
            }
        )
    
    
    try:
        q = date_range_query(8)
        print(dumps(q.to_dict(), indent=2))
    except Exception as e:
        print(e)
    

    which'd print

    {
      "range": {
        "lastModifiedDate": {
          "gte": "now-8d",
          "lt": "now"
        }
      }
    }
    

    Alternatively, if you insisted on using datetime.date objects, you'd need to stringify the dates first. Now, when you do that with str(...), you're in fact calling .__str()__ which then calls .isoformat() and returns a string formatted as YYYY-MM-DD.

    Now, your lastModifiedDate field's mapping may have a different format. As such, it's good practice to declare the format of your range query:

    gte = (datetime.datetime.now() - datetime.timedelta(days=num_of_days)).date()
    lt = (datetime.datetime.now()).date()
    
    return Q(
        "range",
        lastModifiedDate={
            "gte": str(gte),
            "lt": str(lt),
            "format": "yyyy-MM-dd"  # keep in mind that the format in ES conforms to Java syntax, not python
        }
    )
    

    which'd produce a similar query but with concrete, spelled-out dates:

    {
      "range": {
        "lastModifiedDate": {
          "gte": "2021-02-26",
          "lt": "2021-03-06",
          "format": "yyyy-MM-dd"
        }
      }
    }