paginationweaviatevector-database

Is it impossible to paginate filtered results with the Weaviate vector DB?


Query Filter Works

Using Weaviate's query filtering works fine, e.g. from their tutorial:

response = (
    client.query
    .get("JeopardyQuestion", ["question", "answer", "round"])
    .with_where({
        "path": ["round"],
        "operator": "Equal",
        "valueText": "Double Jeopardy!"
    })
    .with_limit(20)
    .do()
)

Pagination Explodes?

But grabbing the first 20 results is not useful for a full-featured semantic search feature. We need pagination to grab the next 20 results, and the next 20, and so on. Weaviate uses query cursors to do this, where a record's UUID is provided:

response = (
    client.query
    .get("JeopardyQuestion", ["question", "answer", "round"])
    .with_where({
        "path": ["round"],
        "operator": "Equal",
        "valueText": "Double Jeopardy!"
    })
    .with_after("6726aaa8-818b-49dc-8fea-9bc646ddfed6")   # <-- ID cursor pagination
    .with_limit(20)
    .do()
)

But this throws an error:

where cannot be set with after and limit parameters

The error comes from this line of Weaviate code, and the with_after() docs say it "requires limit to be set but cannot be combined with any other filters or search."

So we can't combine filters, cursors, and limit parameters like this.

What is the correct way to do filtered query pagination?


Solution

  • you can paginate using the with_limit() and with_offset() methods together.

    For example, let's say you have 10 records in the DB for a given SomeClass class, and you would like to retrieve the first 5 records with one query, and the next 5 with a following query.

    First query:

    client.query.get("SomeClass", "someProp").with_offset(0).with_limit(5).do()
    

    Second query:

    client.query.get("SomeClass", "someProp").with_offset(5).with_limit(5).do()