I have a data structure on Qdrant that in the payload, I have something like this:
{
"attributes": [
{
"attribute_value_id": 22003,
"id": 1252,
"key": "Environment",
"value": "Casual/Daily",
},
{
"attribute_value_id": 98763,
"id": 1254,
"key": "Color",
"value": "Multicolored",
},
{
"attribute_value_id": 22040,
"id": 1255,
"key": "Material",
"value": "Polyester",
},
],
"brand": {
"id": 114326,
"logo": None,
"slug": "happiness-istanbul-114326",
"title": "Happiness Istanbul",
},
}
According to Qdrant documentations, I implemented filtering for brand like this:
filters_list = []
if param_filters:
brands = param_filters.get("brand_params")
if brands:
filter = models.FieldCondition(
key="brand.id",
match=models.MatchAny(any=[int(brand) for brand in brands]),
)
filters_list.append(filter)
search_results = qd_client.search(
query_filter=models.Filter(must=filters_list),
collection_name=f"lang{lang}_products",
query_vector=query_vector,
search_params=models.SearchParams(hnsw_ef=128, exact=False),
limit=limit,
)
Which so far works. But things get complicated when I try to filter on the "attributes" field. As you see, it is a list of dictionaries, containing dictionaries like:
{
"attribute_value_id": 22040,
"id": 1255,
"key": "Material",
"value": "Polyester",
}
And the attrs
filter sent from the front-end is in this structure:
attrs structure: {"attr_id": [attr_value_ids], "attr_id": [att_value_ids]}
>>> example: {'1237': ['21727', '21759'], '1254': ['52776']}
How can I filter to see if the provided attr_id
in the query filter params (here, it is either 1237
, or 1254
) exists in the attributes
field and has one of the attr_value_id
s provided in the list (e.g. ['21727', '21759']
here)?
This is what I've tried so far:
if attrs:
# attrs structure: {"attr_id": [attr_value_ids], "attr_id": [att_value_ids]}
print("attrs from search function:", attrs)
for attr_id, attr_value_ids in attrs.items():
# Convert attribute value IDs to integers
attr_value_ids = [
int(attr_value_id) for attr_value_id in attr_value_ids
]
# Add a filter for each attribute ID and its values
filter = models.FieldCondition(
key=f"attributes.{attr_id}.attr_value_id",
match=models.MatchAny(any=attr_value_ids),
)
filters_list.append(filter)
The problem is that key=f"attributes.{attr_id}.attr_value_id",
is wrong and I do not know how to achieve this.
UPDATE: Maybe one step closer:
I decided to flatten out the data in the db, to maybe do this better. First, I created a new filed named flattened_attributes, that is as below:
[
{
"1237": 21720
},
{
"1254": 52791
},
{
"1255": 22044
},
]
Also, before filtering, I followed the same approach on the attr filters sent from front-end:
if attrs:
# attrs structure: {"attr_id": [attr_value_ids], "attr_id": [att_value_ids]}
# we need to flatten attrs to filter on payloads
flattened_attr = []
for attr_id, attr_value_ids in attrs.items():
for attr_value_id in attr_value_ids:
flattened_attr.append({attr_id:int(attr_value_id)})
Now, i have two similar list of dicts, and i want to filter those who has at leas one of which is received from front-end (flattened_attr
).
There is one type of filtering that we filter if the value of the key exists in a list of values, as mentioned here in the docs. But I do not know how to check if a dict exists in the flattened_attributes
field in the db.
NOTE: The update on the main question was a wrong approach (or just I could not follow it through) and I came up with another approach which solved the problem.
Noting the attributes
field's structure in the main question, we see that there is a attribute_value_id
key, which maybe different for different attributes (e.g. 1254 for "color" and 1255 for "Material").
So, in the search
function, I wrote the following code (I will go through it):
attrs = param_filters.get("attr_params")
if attrs:
# attrs structure: {"attr_id": [attr_value_ids], "attr_id": [attr_value_ids]}
# we need to flatten attrs to filter on payloads
for attr_id, attr_value_ids in attrs.items():
flattened_attr = []
for attr_value_id in attr_value_ids:
flattened_attr.append(int(attr_value_id))
filter = models.FieldCondition(
key="attributes[].attribute_value_id",
match=models.MatchAny(any=flattened_attr),
)
filters_list.append(filter)
search_results = qd_client.search(
query_filter=models.Filter(must=filters_list),
collection_name=f"lang{lang}_products",
query_vector=query_vector,
search_params=models.SearchParams(hnsw_ef=128, exact=False),
limit=limit,
)
First, for each attr_id
I created a separate list containing the attr_value_ids
(I had to convert them to int
).
Then, using Qdrant documentations (here) I used key="attributes[].attribute_value_id"
to go through the list items inside the attributes
field, and inside each list item (each is a dictionary) look for the attribute_value_id
key, and match it with the values sent.
Also, note that I am creating a separate filter for each attr_id:
for attr_id, attr_value_ids in attrs.items():
flattened_attr = []
for attr_value_id in attr_value_ids:
flattened_attr.append(int(attr_value_id))
filter = models.FieldCondition(
key="attributes[].attribute_value_id",
match=models.MatchAny(any=flattened_attr),
)
filters_list.append(filter)
This is because, when multiple values for one attr_id is sent, then at least one of them should be true (OR
between attribute_value_id), but when another attr_id is send, this new one and the previous one both should be true (AND
between each attr_id). Also, note that I am using must
in the main filter conditions, so each filter separately should be True
while inside each filter, any of the value_ids are acceptable.
qd_client.search(
query_filter=models.Filter(must=filters_list),
collection_name=f"lang{lang}_products",
query_vector=query_vector,
search_params=models.SearchParams(hnsw_ef=128, exact=False),
limit=limit,
)