searchvectorbinaryvector-databasemilvus

Error Searching with DataType.BINARY_VECTOR in Milvus


I am attempting to compute Hamming distance with the DataType.BINARY_VECTOR in Milvus. However, when I perform the last step doing client.search(), I encountered an error when I tried to searched with binary vectors.

I have the code attached below.

from pymilvus import MilvusClient, DataType
from pathlib import Path
import numpy as np


DB_FILE = 'demo.db'
DIM = 4096
COLLECTION_NAME = 'dim_reduction'
METRIC_TYPE = 'HAMMING'
INDEX_TYPE = 'BIN_FLAT'
DATATYPE = DataType.BINARY_VECTOR
DTYPE = np.bool_


# Remove DB_FILE if exists
db_path = Path(DB_FILE)
if db_path.exists():
    db_path.unlink()

# Build client
client = MilvusClient(DB_FILE)

# Create schema
schema = MilvusClient.create_schema()
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name='text', datatype=DataType.VARCHAR, max_length=1024)
schema.add_field(field_name='vector', datatype=DATATYPE, dim=DIM)

# Create collection
client.create_collection(
    collection_name=COLLECTION_NAME,
    schema=schema,
)

# Insert data
data = [
    {'id': i, 'vector': np.array([1] * DIM, dtype=DTYPE), 'text': f'doc {i}'}
    for i in range(100)
]
client.insert(collection_name=COLLECTION_NAME, data=data)

# Create index
index_params = MilvusClient.prepare_index_params()
index_params.add_index(
    field_name='vector',
    metric_type=METRIC_TYPE,
    index_type=INDEX_TYPE,
)
client.create_index(
    collection_name=COLLECTION_NAME,
    index_params=index_params,
)

# Search
search_params = {
    'metric_type': METRIC_TYPE,
    'params': {},
}

result = client.search(
    collection_name=COLLECTION_NAME,
    data=[np.array([1] * DIM, dtype=DTYPE)],
    limit=2,
    search_params=search_params,
)
print(result)

Can someone please take a look at this for me? Thanks a lot!


Solution

  • I believe the error appears because binary vectors need to be in the form of byte arrays. I found this in one of the Milvus examples that might be helpful to you (link: https://github.com/milvus-io/pymilvus/blob/f7a4839a8a6b05620985d25cde47b63247a561e7/examples/binary_example.py#L23):

    def gen_binary_vectors(num, dim):
        raw_vectors = []
        binary_vectors = []
        for _ in range(num):
            raw_vector = [random.randint(0, 1) for _ in range(dim)]
            raw_vectors.append(raw_vector)
            # packs a binary-valued array into bits in a unit8 array, and bytes array_of_ints
            binary_vectors.append(bytes(np.packbits(raw_vector, axis=-1).tolist()))
        return raw_vectors, binary_vectors