databasevector-databasemilvus

Question regarding batch Insert & Read Consistency


Context

My question is about batch inserts into the database and their consistency for readers.

In the "Strong" or "Eventual Staleness" consistency levels, if we insert data as a batch, is it possible for a reader to see only part of a batch?

In my use case, I'm doing the following:

The database will look like this:

id message_id vector
0 1 ...
1 1 ...
2 1 ...
3 2 ...
4 2 ...
5 2 ...

I want to ensure that when I query, I'm consistently retrieving all vectors associated with a specific message_id.

Questions

  1. In the “Strong” or “Eventual Staleness” consistency levels, is it possible to see only some vectors for a message_id, or can we assume we’ll see all of them if they are inserted as a batch?
  2. If records are inserted as a batch, is there a possibility of them being interleaved when using an auto-incrementing id, like so:
id message_id vector
0 1 ...
1 2 ...
2 1 ...
3 1 ...
4 2 ...
5 2 ...

Solution

  • The answer is maybe, but the possibility is very low.

    Since we don't have atomicity between multiple shards, you may fail with data partially written into one topic but failed on the other.

    If you have only one shard then it's not a problem.

    if you have multiple shards, then data can still be seen at the same time due to the time tick system.

    it's works like a mvcc. but I don't think for vectorDB it is a must to search with strong consistency and transactions