pythonpython-3.xredisredis-py

Indexing and searching over a nested JSON field in Redis Python


I am trying to set an index to a nested field inside Redis to search over it easily, specifically a numeric field representing a timestamp, but I can't figure it out. The documentation is quite complicated and ever since RedisSearch was merged with main Redis, I've been struggling to find any good examples.

Here's my attempt:

import time
from redis import Redis
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.field import NumericField
from redis.commands.search.query import Query, NumericFilter


def main():
    r = None
    test_dict1 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext", "timestamp": str(time.time())}]}}
    test_dict2 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext2", "timestamp": str(time.time() + 10)}]}}

    try:
        r = Redis()
        r.json().set("uuid:4587-7d5f9-4545", "$", test_dict1)
        r.json().set("uuid:4587-7d5f9-4546", "$", test_dict2)
        r.ft('timestamp').create_index(fields=(NumericField("$.messages.timestamp")), definition=IndexDefinition(prefix=['uuid:'], index_type=IndexType.HASH))
        print(r.json().get("uuid:4587-7d5f9-4545", "$.context.test.other"))
        q = Query("*").add_filter(NumericFilter(field="$.messages.timestamp", minval=0, maxval=time.time()))

        print(r.ft('timestamp').search(q))
    except Exception as e:
        raise e
    finally:
        if r is not None:
            r.flushall()


if __name__ == "__main__":
    main()

That currently returns 0 results, but doesn't throw any errors.


Solution

  • There's a few problems here. First, your dictionary contains the timestamps as strings and they are indexed as numeric. That will silently fail because of the type mismatch. So, replace that with:

        test_dict1 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext", "timestamp": time.time()}]}}
        test_dict2 = {"context": {"test": {"other": "test"}, "messages": [{"text": "mytext2", "timestamp": time.time() + 10}]}}
    

    Secondly, you've got a typo in your field definition as you don't actually have a JSON key at $.messages.timestamp, it's at $.context.messages.[*].timestamp so you need to change your index definition. For the sake of readability you might want to include an alias for that field. Finally, as @simon-prickett says, you are indexing the documents as hashes so you need to declare it as a JSON index:

            r.ft('timestamp').create_index(fields=(NumericField("$.context.messages.[*].timestamp", as_name = "ts")), definition=IndexDefinition(prefix=['uuid:'], index_type=IndexType.JSON))
    

    Once that's done you can query as

            q = Query("*").add_filter(NumericFilter(field="ts", minval=0, maxval=time.time()))
    

    and get your results.