vector-databasemilvus

How to get all vector ids from Milvus2.0?


I used to use Milvus1.0. And I can get all IDs from Milvus1.0 by using get_collection_stats and list_id_in_segment APIs.

These days I am trying Milvus2.0. And I also want to get all IDs from Milvus2.0. But I don't find any ways to do it.


Solution

  • milvus v2.0.x supports queries using boolean expressions.
    This can be used to return ids by checking if the field is greater than zero.
    Let's assume you are using this schema for your collection.
    referencing: https://github.com/milvus-io/pymilvus/blob/master/examples/hello_milvus.py
    as of 3/8/2022

    fields = [
        FieldSchema(name="pk", dtype=DataType.INT64, is_primary=True, auto_id=False),
        FieldSchema(name="random", dtype=DataType.DOUBLE),
        FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim)
    ]
    
    schema = CollectionSchema(fields, "hello_milvus is the simplest demo to introduce the APIs")
    
    hello_milvus = Collection("hello_milvus", schema, consistency_level="Strong")
    

    Remember to insert something into your collection first... see the pymilvus example.
    Here you want to query out all ids (pk)
    You cannot currently list ids specific to a segment, but this would return all ids in a collection.

    res = hello_milvus.query(
      expr = "pk >= 0", 
      output_fields = ["pk", "embeddings"]
    )
    for x in res:
        print(x["pk"], x["embeddings"])
    

    I think this is the only way to do it now, since they removed list_id_in_segment