I'm using Pydantic to define a model where one of the fields, embedding
, is expected to be a list[float]
. However, I want to be able to pass a string to this field, and then have a validator transform this string into a list[float]
before initialization.
Here's the code I'm working with:
from pydantic import BaseModel, field_validator
import uuid
class ChunkInsert(BaseModel):
embedding: list[float]
file_id: uuid.UUID
@field_validator(
"embedding",
mode="before",
)
@classmethod
def embed_files(cls, value: str) -> list[float]:
return embed_text(value)[0]
chunk_in = ChunkInsert(
embedding="a",
file_id=uuid.UUID("987f5c8a-5577-4662-be1d-cb1ba016f6f5"),
)
The code works as expected, and embed_files
processes the string and converts it into a list[float]
. However, I'm getting the following type error in VS Code from Pylance:
Argument of type "Literal['a']" cannot be assigned to parameter "embedding" of type "list[float]" in function "init" "Literal['a']" is incompatible with "list[float]"PylancereportArgumentType
It seems like Pylance is not recognizing that the embedding field should be processed by the embed_files validator before the type check.
So my question is: is there a way to configure Pydantic or Pylance so that this kind of pre-initialization validation doesn't trigger a type error?
Edit: since pylance is a static type checker and I am dynamically changing the type before the model creation, is this even possible?
Here is one solution that probably does what you want. Note that if you wanted to calculate the embedding only if accessed you could turn the property into a cached_property and calculate it there. I'm sort of assuming that you might want to pass in the embedding sometimes, so I've included that functionality in the solution...
from typing import Self
from pydantic import BaseModel, Field, computed_field, model_validator
class ChunkInsert(BaseModel):
text: str
embedding_: list[float] | None = Field(default=None, exclude=True, repr=False)
@computed_field
@property
def embedding(self) -> list[float]:
assert self.embedding_
return self.embedding_
@model_validator(mode="after")
def embed_files(self) -> Self:
if not self.embedding_:
self.embedding_ = [1.0]
return self
# all of these make the typechecker happy
print(ChunkInsert(text="foobar"))
print(ChunkInsert(text="moodbar", embedding_=[1.0]))
print(ChunkInsert(text="foobar").model_dump())
print(ChunkInsert(text="moobar", embedding_=[1.0]).model_dump())
print(ChunkInsert(text="moodbar", embedding_=[1.0]).embedding[0])
# output:
# text='foobar' embedding=[1.0]
# text='moodbar' embedding=[1.0]
# {'text': 'foobar', 'embedding': [1.0]}
# {'text': 'moobar', 'embedding': [1.0]}
# 1.0
For anyone who wants a slightly less correct but less verbose solution, the following idea would do the same:
from typing import Self
from pydantic import BaseModel, Field, computed_field, model_validator
class ChunkInsert(BaseModel):
text: str
embedding: list[float] = []
@model_validator(mode="after")
def embed_files(self) -> Self:
if not self.embedding:
self.embedding = [1.0]
return self
# same happiness, same output