I'm trying to create a Qdrant vectorsore and add my documents.
OpenAIEmbeddings
QdrantClient
is local for my caseVectorParams(size=2000, distance=Distance.EUCLID)
I'm getting the following error:
ValueError: could not broadcast input array from shape (1536,) into shape (2000,)
I understand that my error is how I configure the vectorParams, but I don't undertsand how these values need to be calculated.
here's my complete code:
import os
from typing import List
from langchain.docstore.document import Document
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Qdrant, VectorStore
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
def load_documents(documents: List[Document]) -> VectorStore:
"""Create a vectorstore from documents."""
collection_name = "my_collection"
vectorstore_path = "data/vectorstore/qdrant"
embeddings = OpenAIEmbeddings(
model="text-embedding-ada-002",
openai_api_key=os.getenv("OPENAI_API_KEY"),
)
qdrantClient = QdrantClient(path=vectorstore_path, prefer_grpc=True)
qdrantClient.create_collection(
collection_name=collection_name,
vectors_config=VectorParams(size=2000, distance=Distance.EUCLID),
)
vectorstore = Qdrant(
client=qdrantClient,
collection_name=collection_name,
embeddings=embeddings,
)
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
)
sub_docs = text_splitter.split_documents(documents)
vectorstore.add_documents(sub_docs)
return vectorstore
Any ideas on how I should configure the vector params properly?
So, as I see, the value of 1536
is fixed by the vector size of the OpenAIEmbeddings
.
Quoting from this article: https://openai.com/blog/new-and-improved-embedding-model
The new embeddings have only 1536 dimensions, one-eighth the size of davinci-001 embeddings, making the new embeddings more cost effective in working with vector databases.
Thus, changing the above code to VectorParams(size=1536, distance=Distance.EUCLID)
, made the trick.