python-3.xredisredis-pyredisearchredisjson

RedisJSON and Python3: JSON.get and ft('index').search(Query('@orig_ip:{192\.168\.210\.27}')) returning no results (matching entry in redis)


I am new to redis and created an index and am attempting to ingest Zeek Logging Data, create an index for multiple fields, and then search fields in that index. For the life of me, I cannot get any values to return when searching for the @orig_ip name or using JSON.GET to retried any id.* related fields.

UPDATE: I figured this out after more troubleshooting and am updating here to help anyone else struggling with this problem.

Here is my WRONG code for creating the index:

# Options for index creation
index_def = IndexDefinition(
                index_type=IndexType.JSON,
                prefix = ['uid:'],
                score = 0.5,
                score_field = 'doc_score'
)

# Schema definition
schema = (  
            TagField('$.orig_l2_addr', as_name='orig_mac'),
            TagField('$.id.orig_h', as_name='orig_ip'), #Wrong field path
            TagField('$.id.resp_h', as_name='resp_ip'), #Wrong field path
            NumericField('$.orig_bytes', as_name='orig_bytes'),
            NumericField('$.resp_bytes', as_name='resp_bytes'),
            NumericField('$.ts', as_name='timestamp')
)

r.ft('py_conn_idx').create_index(schema, definition = index_def)

Here is the result I kept getting with the above WRONG schema (no results)

search_result4 = r.ft('py_conn_idx').search(Query('@orig_ip:{192\.168\.210\.27}'))
Results for "@orig_ip:{192\.168\.210\.27}":
0

UPDATE: Working schema definition:

So it turns out even though Zeek is only using the . in field names vice using it to create an object, but the . in the field names was the culprit in my query failures. I needed to access the fields for the index as follows:

# Schema definition
schema = (  
            TagField('$.orig_l2_addr', as_name='orig_mac'),
            TagField('$.["id.orig_h"]', as_name='orig_ip'), #Fixed field reference
            TagField('$.["id.resp_h"]', as_name='resp_ip'), #Fixed field reference
            NumericField('$.orig_bytes', as_name='orig_bytes'),
            NumericField('$.resp_bytes', as_name='resp_bytes'),
            NumericField('$.ts', as_name='timestamp')
)

After recreating the index with this schema, I get results with my query:

Results for "@orig_ip:{192\.168\.210\.27}":
Document {'id': 'uid:CPvYfTI4Zb1Afp2l5',....

Thanks to this stackoverflow question for finally walking me to the cause of my troubles: How to get objects value if its name contains dots?


Solution

  • Putting this answer here so this question gets marked as having one. See the updated question/code above!