python-3.xllama-index

LLamaindex: How to add new documents to an existing index


I´m testing a RAG system and I have this code which takes a pdf file, creates a lancedb and query it:

from llama_index.core import VectorStoreIndex, Settings, StorageContext, Document, SimpleDirectoryReader, \
    load_index_from_storage
from llama_index.vector_stores.lancedb import LanceDBVectorStore
from llama_index.embeddings.huggingface import HuggingFaceEmbedding


vector_store = LanceDBVectorStore(lancedb)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
documents = SimpleDirectoryReader(input_files=["csr.pdf"]).load_data()
index = VectorStoreIndex.from_documents(
        documents, storage_context=storage_context, uri=lancedb
    )
query_engine = index.as_query_engine()
response = query_engine.query("Installation Example for CSR1000V Router")
print(response)

The code is working fine, but my question is:

How to add more documents to it???

I know I can pass multiple files here:

documents = SimpleDirectoryReader(input_files=["csr.pdf"]).load_data()

or even a full folder but I want to add documents later.

If I just write this again:

documents = SimpleDirectoryReader(input_files=["new.pdf"]).load_data()
index = VectorStoreIndex.from_documents(
        documents, storage_context=storage_context, uri=lancedb
    )

This will always override the previous information.

So my question is how to add more and more pdf to my lancedb???

i have tried:

documents = SimpleDirectoryReader(input_files=["new.pdf"]).load_data()

but this overrides the previous index


Solution

  • append keyword did the trick:

    vector_store_lancedb = LanceDBVectorStore(uri=LANCEDB_URI, mode="append")