node.jsredisembeddingcosine-similaritysimilarity-search

Redis vector similarity search syntax error


Curently Im working on caching for out chatbot I resided to use native redis similarity search. Their node-redis librarie provide interface to do it. Ando also LangchainJS has options for caching, but it wont work as I need it, because I already have embeddings to save and search with, and Embedding class wont work (maybe it can, but this will be crutches), so I desided to write custom solution with node-redis. And here the problem...

Here in their documentation they have examples how to work with KNN queries FT.SEARCH idx "*=>[KNN 10 @vec $BLOB]" PARAMS 2 BLOB "\x12\xa9\xf5\x6c" DIALECT 2 and it work (at least there is no errors) if I use it in termila or RedisInsight, but when I'm trying to run this query with nodejs it throws error [ErrorReply: Syntax error at offset 2 near > ]

async function searchSimilar(vector: number[]) {
  const query = `* => [KNN 10 @embedding $BLOB]`;

  const options = {
    PARAMS: {
      BLOB: Buffer.from(new Float32Array(vector).buffer),
      DIALECT: 2,
    },
  };

  const result = await client.ft.search("idx:answers", query, options);

  return result;
}

Maybe I didn't understand how it works... Here the example how it work in Langchain

    async similaritySearchVectorWithScore(query, k, filter) {
        if (filter && this.filter) {
            throw new Error("cannot provide both `filter` and `this.filter`");
        }
        const _filter = filter ?? this.filter;
        const results = await this.redisClient.ft.search(this.indexName, ...this.buildQuery(query, k, _filter));
        const result = [];
        if (results.total) {
            for (const res of results.documents) {
                if (res.value) {
                    const document = res.value;
                    if (document.vector_score) {
                        result.push([
                            new Document({
                                pageContent: document[this.contentKey],
                                metadata: JSON.parse(this.unEscapeSpecialChars(document.metadata)),
                            }),
                            Number(document.vector_score),
                        ]);
                    }
                }
            }
        }
        return result;
    }

    buildQuery(query, k, filter) {
        const vectorScoreField = "vector_score";
        let hybridFields = "*";
        // if a filter is set, modify the hybrid query
        if (filter && filter.length) {
            // `filter` is a list of strings, then it's applied using the OR operator in the metadata key
            // for example: filter = ['foo', 'bar'] => this will filter all metadata containing either 'foo' OR 'bar'
            hybridFields = `@${this.metadataKey}:(${this.prepareFilter(filter)})`;
        }
        const baseQuery = `${hybridFields} => [KNN ${k} @${this.vectorKey} $vector AS ${vectorScoreField}]`;
        const returnFields = [this.metadataKey, this.contentKey, vectorScoreField];
        const options = {
            PARAMS: {
                vector: this.getFloat32Buffer(query),
            },
            RETURN: returnFields,
            SORTBY: vectorScoreField,
            DIALECT: 2,
            LIMIT: {
                from: 0,
                size: k,
            },
        };
        return [baseQuery, options];
    }
    getFloat32Buffer(vector) {
        return Buffer.from(new Float32Array(vector).buffer);
    }

Here I found one more example in Python, and I can't understand why my query doesn't work

def create_query(
   return_fields: list,
   search_type: str="KNN",
   number_of_results: int=20,
   vector_field_name: str="img_vector",
   gender: t.Optional[str] = None,
   category: t.Optional[str] = None
):
   tag = "("
   if gender:
       tag += f"@gender:{{{gender}}}"
   if category:
       tag += f"@category:{{{category}}}"
   tag += ")"
   # if no tags are selected
   if len(tag) < 3:
       tag = "*"

   base_query = f'{tag}=>[{search_type} {number_of_results} @{vector_field_name} $vec_param AS vector_score]'
   return Query(base_query)\
       .sort_by("vector_score")\
       .paging(0, number_of_results)\
       .return_fields(*return_fields)\
       .dialect(2)

I've read documentations, searched internet, opened issues and discussions on GitHub and researched build of node js libraries


Solution

  • Take a look at this example. The DIALECT attribute should appear on its own, and not as part of the PARAMS, which should include only parameters for the query itself (like the BLOB in your example).

    Basically you passed the DIALECT as a parameter for the query (even though it is not used there), while the parsing itself was done using the default dialect, which is currently 1 (can be changed with FT.CONFIG SET DEFAULT_DIALECT <n>). KNN search is not available in dialect 1 so you get a syntax error.

    Hope that helps!