I want to find total website visits by summing up the visits of individual pages.
In my Redis, I have JSONs like:
{'page_name': 'home', 'visit_count' : 10}
{'page_name': 'add_to_cart', 'visit_count' : 7}
{'page_name': 'checkout', 'visit_count' : 5}
I want to run an aggregation logic using FT.AGGREATE so that I can get, total_visits = 10 + 7 + 5 = 22
I can add necessary indexing, if required.
You can use FT.AGGREGATE
for that. You will have to index at least one field for that.
Here is an example using your data.
Add Data
127.0.0.1:6379> JSON.SET doc:1 $ '{"page_name": "home", "visit_count" : 10}'
OK
127.0.0.1:6379> JSON.SET doc:2 $ '{"page_name": "add_to_cart", "visit_count" : 7}'
OK
127.0.0.1:6379> JSON.SET doc:3 $ '{"page_name": "checkout", "visit_count" : 5}'
OK
Create an index
At least one field needs to be indexed. Here are few possible options:
Indexing visit_count
:
FT.CREATE idx ON JSON PREFIX 1 doc: SCHEMA $.visit_count AS visit_count SORTABLE
Indexing page_name
FT.CREATE idx ON JSON PREFIX 1 doc: SCHEMA $.page_name AS page_name TAG
Indexing page_name
and referring to visit_count
(multiple lines for readability):
FT.CREATE idx ON JSON PREFIX 1 doc: SCHEMA
$.visit_count AS visit_count SORTABLE NOINDEX
$.page_name AS page_name TAG
Notice that I mention visit_count
with NOINDEX
for quick access but without indexing (so it cannot be referred to in the query part, but can be referred to as part of the aggregations).
Querying the index
The trick is to use some dummy field that is equal in all the documents. You can use an existing one if you have one, but if not you can create one using APPLY
(documentation here):
APPLY 1 AS __dummy
will add to all the documents a field named __dummy
with the value 1
. This is only for the query context and it will not be added to the JSON object itself. Of course, any name and value are possible.
For cases 1 and 3:
FT.AGGREGATE idx * APPLY 1 AS __dummy GROUPBY 1 @__dummy REDUCE SUM 1 @visit_count AS count
For case 2:
FT.AGGREGATE idx * APPLY 1 AS __dummy LOAD 3 $.visit_count AS visit_count GROUPBY 1 @__dummy REDUCE SUM 1 @visit_count AS count
Output (for all cases):
1) (integer) 1
2) 1) "__dummy"
2) "1"
3) "count"
4) "22"
EDIT
Apparently, you can GROUPBY
0 field, so the APPLY
trick is redundant.
Use
FT.AGGREGATE idx * GROUPBY 0 REDUCE SUM 1 @visit_count AS count
for cases 1 and 3, or, for case 2:
FT.AGGREGATE idx * LOAD 3 $.visit_count AS visit_count GROUPBY 0 REDUCE SUM 1 @visit_count AS count
to get the output:
1) (integer) 1
2) 1) "count"
2) "22"
Using redis-py:
import redis
import redis.commands.search
import redis.commands.search.reducers as reducers
from redis.commands.search.aggregation import AggregateRequest
from redis.commands.json.path import Path
from redis.commands.search.field import (
NumericField,
TagField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
conn = redis.Redis(host="localhost", port=6379, db=0)
definition = IndexDefinition(
prefix=["doc:"],
index_type=IndexType.JSON,
)
conn.ft().create_index((
TagField("$.page_name", as_name="page_name"),
NumericField("$.visit_count", as_name="visit_count", sortable=True, no_index=True),
),
definition=definition,
)
conn.json().set("doc:1", Path.root_path(), {"page_name": "home", "visit_count": 10})
conn.json().set("doc:2", Path.root_path(), {"page_name": "add_to_cart", "visit_count": 7})
conn.json().set("doc:3", Path.root_path(), {"page_name": "checkout", "visit_count": 5})
total_sum = conn.ft().aggregate(AggregateRequest().group_by([], reducers.sum("visit_count"))).rows[0][1]
print(int(total_sum)) # 22
ALTERNATIVE APPROACH
If you don't need to index the documents for other reasons, you can use a LUA script to scan all the keys and sum up this value. Scanning the entire key space might be slow if you have a lot of keys, but it will not require additional memory overhead and indexing time when modifying or adding new keys