redislangchainvector-searchsimilarity-search

How to do hybrid search on Redis using Langchain


I'm trying to pass filters to redis retriever to do hybrid search on my embeddings (vector + metadata filtering). The following doesn't work! It fails to pass the filters and filters would always be None:

retriever = redis.as_retriever(
            search_type="similarity_distance_threshold",
            search_kwargs="{'include_metadata': True,'distance_threshold': 0.8,'k': 5}",
            filter="(@launch:{false} @menu_text:(%%chicken%%))"
        )

I found another example and apparently filter expression should be pass as search_kwargs, but I can't figure out what should be the correct syntax. If I do it as follow:

retriever = redis.as_retriever(
            search_type="similarity_distance_threshold",
            "retriever_search_kwargs":"{'include_metadata': True,'distance_threshold': 0.8,'k': 5, 'filter': '@menu_text:(%%chicken%%) @lunch:{true}'}",
}

it generates this search query: similarity_search_by_vector > redis_query : (@content_vector:[VECTOR_RANGE $distance_threshold $vector] @menu_text:(%%chicken%%) @lunch:{true})=>{$yield_distance_as: distance}

and fails with the following error: redis.exceptions.ResponseError: Invalid attribute yield_distance_as

Any idea how to fix it? System Info: langchain 0.0.346 langchain-core 0.0.10

python 3.9.18


Solution

  • It was a bug in Langchain! I found that '_prepare_range_query()' in langchain, is generating Redis query with wrong syntax. So I made the following small change which fixed the error for us:

    def _prepare_range_query(
        self,
        k: int,
        filter: Optional[RedisFilterExpression] = None,
        return_fields: Optional[List[str]] = None,
    ) -> "Query":
        try:
            from redis.commands.search.query import Query
        except ImportError as e:
            raise ImportError(
                "Could not import redis python package. "
                "Please install it with `pip install redis`."
            ) from e
    
        return_fields = return_fields or []
        vector_key = self._schema.content_vector_key
        base_query = f"@{vector_key}:[VECTOR_RANGE $distance_threshold $vector]"
    
        if filter:
            # base_query = "(" + base_query + " " + str(filter) + ")"
            base_query = str(filter) + " " + base_query
    
        query_string = base_query + "=>{$yield_distance_as: distance}"
    
        return (
            Query(query_string)
            .return_fields(*return_fields)
            .sort_by("distance")
            .paging(0, k)
            .dialect(2)
        )