pythonpython-3.xfilteringqdrant

Qdrant filteration using nested object fields


I have a data structure on Qdrant that in the payload, I have something like this:


{
    "attributes": [
        {
            "attribute_value_id": 22003,
            "id": 1252,
            "key": "Environment",
            "value": "Casual/Daily",
        },
        {
            "attribute_value_id": 98763,
            "id": 1254,
            "key": "Color",
            "value": "Multicolored",
        },
        {
            "attribute_value_id": 22040,
            "id": 1255,
            "key": "Material",
            "value": "Polyester",
        },
    ],
    "brand": {
        "id": 114326,
        "logo": None,
        "slug": "happiness-istanbul-114326",
        "title": "Happiness Istanbul",
    },
}

According to Qdrant documentations, I implemented filtering for brand like this:

filters_list = []
    if param_filters:
        brands = param_filters.get("brand_params")
        if brands:
            filter = models.FieldCondition(
                key="brand.id",
                match=models.MatchAny(any=[int(brand) for brand in brands]),
            )
            filters_list.append(filter)
        search_results = qd_client.search(
            query_filter=models.Filter(must=filters_list),
            collection_name=f"lang{lang}_products",
            query_vector=query_vector,
            search_params=models.SearchParams(hnsw_ef=128, exact=False),
            limit=limit,
        )

Which so far works. But things get complicated when I try to filter on the "attributes" field. As you see, it is a list of dictionaries, containing dictionaries like:

{
    "attribute_value_id": 22040,
    "id": 1255,
    "key": "Material",
    "value": "Polyester",
}

And the attrs filter sent from the front-end is in this structure:

attrs structure: {"attr_id": [attr_value_ids], "attr_id": [att_value_ids]}
>>> example: {'1237': ['21727', '21759'], '1254': ['52776']}

How can I filter to see if the provided attr_id in the query filter params (here, it is either 1237, or 1254) exists in the attributes field and has one of the attr_value_ids provided in the list (e.g. ['21727', '21759'] here)?

This is what I've tried so far:

if attrs:
            # attrs structure: {"attr_id": [attr_value_ids], "attr_id": [att_value_ids]}
            print("attrs from search function:", attrs)
            for attr_id, attr_value_ids in attrs.items():
                # Convert attribute value IDs to integers
                attr_value_ids = [
                    int(attr_value_id) for attr_value_id in attr_value_ids
                ]
                # Add a filter for each attribute ID and its values
                filter = models.FieldCondition(
                    key=f"attributes.{attr_id}.attr_value_id",
                    match=models.MatchAny(any=attr_value_ids),
                )
                filters_list.append(filter)

The problem is that key=f"attributes.{attr_id}.attr_value_id", is wrong and I do not know how to achieve this.

UPDATE: Maybe one step closer:

I decided to flatten out the data in the db, to maybe do this better. First, I created a new filed named flattened_attributes, that is as below:

[
  {
    "1237": 21720
  },
  {
    "1254": 52791
  },
  {
    "1255": 22044
  },
]

Also, before filtering, I followed the same approach on the attr filters sent from front-end:

        if attrs:
            # attrs structure: {"attr_id": [attr_value_ids], "attr_id": [att_value_ids]}
            # we need to flatten attrs to filter on payloads
            flattened_attr = []
            for attr_id, attr_value_ids in attrs.items():
                for attr_value_id in attr_value_ids:
                    flattened_attr.append({attr_id:int(attr_value_id)})

Now, i have two similar list of dicts, and i want to filter those who has at leas one of which is received from front-end (flattened_attr).

There is one type of filtering that we filter if the value of the key exists in a list of values, as mentioned here in the docs. But I do not know how to check if a dict exists in the flattened_attributes field in the db.


Solution

  • NOTE: The update on the main question was a wrong approach (or just I could not follow it through) and I came up with another approach which solved the problem.

    Noting the attributes field's structure in the main question, we see that there is a attribute_value_id key, which maybe different for different attributes (e.g. 1254 for "color" and 1255 for "Material").

    So, in the search function, I wrote the following code (I will go through it):

    attrs = param_filters.get("attr_params")
    if attrs:
        # attrs structure: {"attr_id": [attr_value_ids], "attr_id": [attr_value_ids]}
        # we need to flatten attrs to filter on payloads
        for attr_id, attr_value_ids in attrs.items():
            flattened_attr = []
            for attr_value_id in attr_value_ids:
                flattened_attr.append(int(attr_value_id))
    
            filter = models.FieldCondition(
                key="attributes[].attribute_value_id",
                match=models.MatchAny(any=flattened_attr),
            )
            filters_list.append(filter)
    search_results = qd_client.search(
        query_filter=models.Filter(must=filters_list),
        collection_name=f"lang{lang}_products",
        query_vector=query_vector,
        search_params=models.SearchParams(hnsw_ef=128, exact=False),
        limit=limit,
    )
    

    First, for each attr_id I created a separate list containing the attr_value_ids (I had to convert them to int).

    Then, using Qdrant documentations (here) I used key="attributes[].attribute_value_id" to go through the list items inside the attributes field, and inside each list item (each is a dictionary) look for the attribute_value_id key, and match it with the values sent.

    Also, note that I am creating a separate filter for each attr_id:

    for attr_id, attr_value_ids in attrs.items():
        flattened_attr = []
        for attr_value_id in attr_value_ids:
            flattened_attr.append(int(attr_value_id))
    
        filter = models.FieldCondition(
            key="attributes[].attribute_value_id",
            match=models.MatchAny(any=flattened_attr),
        )
        filters_list.append(filter)
    

    This is because, when multiple values for one attr_id is sent, then at least one of them should be true (OR between attribute_value_id), but when another attr_id is send, this new one and the previous one both should be true (AND between each attr_id). Also, note that I am using must in the main filter conditions, so each filter separately should be True while inside each filter, any of the value_ids are acceptable.

    qd_client.search(
        query_filter=models.Filter(must=filters_list),
        collection_name=f"lang{lang}_products",
        query_vector=query_vector,
        search_params=models.SearchParams(hnsw_ef=128, exact=False),
        limit=limit,
    )