chatbotlangchainchromadbollamarag

Kernel crash while excecuting vector embedding operation in ChromaDB


I'm trying to build a chatbot using Ollama locally. And I'm stuck with my embedding process (chromadb).

When I provide a full pdf, the kernel crashes during the embedding process.

It works fine when I provide a single chapter from the book.

Problem block:

# Create the Chroma vector store
from langchain_chroma.vectorstores import Chroma
try:
    vector_db = Chroma.from_documents(
        documents=chunked_document,
        embedding=embedding_model,
        collection_name="local-rag",
        persist_directory="./db/db_nomic"
    )
    print("Embedded Documents stored in ChromaDB successfully!")
except Exception as e:
    print(f"An error occurred: {e}")

output

Note

embedding_model = OllamaEmbeddings(model="nomic-embed-text")

chunked_document = [Document(metadata={'source': 'xxx', 'page': 1, 'math_expressions': 'xxx'}, page_content=''), .... ]

Additional info:

Python version = 3.12.7

What I've tried so far:

from langchain_chroma.vectorstores import Chroma
vector_db = Chroma(
    collection_name="local-rag",
    persist_directory="./dtbs/db_nomic",
    embedding_function=embedding_model
)
texts = [chunk.page_content for chunk in chunked_document]
metadatas = [chunk.metadata for chunk in chunked_document]
batch_size = 100
for i in range(0, len(texts), batch_size):
    batch_texts = texts[i:i+batch_size]
    batch_metadatas = metadatas[i:i+batch_size]
    vector_db.add_texts(texts=batch_texts, metadatas=batch_metadatas)
chapter_paths = [
    "./partial_databases/db_nomic/ch1",
    "./partial_databases/db_nomic/ch2",
    ...,
]
vector_db = Chroma(
    collection_name = "local-rag",
    persist_directory = "./db/db_nomic",
    embedding_function = embedding_model
)
# Merge documents from each chapter database into main_db
for path in chapter_paths:
    chapter_db = Chroma(
        collection_name = "local-rag",
        persist_directory=path,
        embedding_function=embedding_model
    )
    
    # Retrieve all documents (vectors) from the current chapter database
    chapter_data = chapter_db.get()    
    
    # Extract documents and metadatas
    docs = chapter_data['documents']
    metadatas = chapter_data['metadatas']
        
    vector_db.add_texts(texts=docs, metadatas=metadatas)

print("Documents successfully merged into main database.")

I'm expecting to create a vector database using chromadb to store the whole pdf (246 pages)


Solution

  • These steps solved my issue:

    As the problem was solved by fresh installation of the dependencies, Most probably I faced the issue because of some internal dependency conflict.