I´m testing a RAG system and I have this code which takes a pdf file, creates a lancedb and query it:
from llama_index.core import VectorStoreIndex, Settings, StorageContext, Document, SimpleDirectoryReader, \
load_index_from_storage
from llama_index.vector_stores.lancedb import LanceDBVectorStore
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
vector_store = LanceDBVectorStore(lancedb)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
documents = SimpleDirectoryReader(input_files=["csr.pdf"]).load_data()
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context, uri=lancedb
)
query_engine = index.as_query_engine()
response = query_engine.query("Installation Example for CSR1000V Router")
print(response)
The code is working fine, but my question is:
How to add more documents to it???
I know I can pass multiple files here:
documents = SimpleDirectoryReader(input_files=["csr.pdf"]).load_data()
or even a full folder but I want to add documents later.
If I just write this again:
documents = SimpleDirectoryReader(input_files=["new.pdf"]).load_data()
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context, uri=lancedb
)
This will always override the previous information.
So my question is how to add more and more pdf to my lancedb???
i have tried:
documents = SimpleDirectoryReader(input_files=["new.pdf"]).load_data()
but this overrides the previous index
append
keyword did the trick:
vector_store_lancedb = LanceDBVectorStore(uri=LANCEDB_URI, mode="append")