I am used to normal python classes, and I am trying to learn pydantic now. It's been a lot harder than I expected. What I often do is initiate a class with some initial input and based on that initial input I "calculate" a lot of attributes for that class. I can't get the creation of "calculated" attributes figured out in pydantic.
I created the following example to demonstrate the issue:
from pydantic import BaseModel, computed_field
from typing import List
class Person(BaseModel):
first_name: str
last_name: str
@computed_field
@property
def composite_name(self) -> str:
print("initializing composite_name")
return f"{self.first_name} {self.last_name}"
@computed_field
@property
def composite_name_list(self) -> List[str]:
print("initializing name_list")
return [f"{self.composite_name} {i}" for i in range(5)]
p = Person(first_name="John", last_name="Doe")
print(p.composite_name_list)
In the code above I would expect this code to run composite_name and create the composite_name attribute. Then I would expect it to run composite_name_list and create the composite_name_list attribute. It would thus go through each of this functions exactly once, and it would print once "intializing composite_name" and then "intializing name_list".
Instead, the print-out I get is:
initializing name_list
initializing composite_name
initializing composite_name
initializing composite_name
initializing composite_name
initializing composite_name
['John Doe 0', 'John Doe 1', 'John Doe 2', 'John Doe 3', 'John Doe 4']
A couple of odd things in this printout:
In standard python, I would have just created this class like this:
class PersonStandardPython:
def __init__(self, first_name, last_name):
self.first_name = first_name
self.last_name = last_name
self.composite_name = f"{first_name} {last_name}"
self.composite_name_list = [f"{self.composite_name} {i}" for i in range(5)]
How can I get to a similar result as my standard python implementation while still having the benefit of pydantics strong typing?
I think there are some misunderstandings on how the computed_field
works and how it is meant to be used. computed_field
acts very much like a property
in Python, that's why it also uses the property decorator in addition. It mimics the appearance of an attribute, while computing its value "on request" only (see Python docs). The computed_field
decorator then only add this property to the list of valid fields to the Pydantic model and thus it can be used for e.g. serialization.
In general computed fields / properties can be used to re-compute another value based of mutable attributes. In case of your first example one could modify first_name
or last_name
and composite_name
would still return the correct name, for example:
p = Person(first_name="John", last_name="Doe")
print(p.composite_name)
p.first_name = "Jane"
print(p.composite_name)
Which should print :
John Doe
Jane Doe
In contrast, in your second example, if you modified first_name
, composite_name
would still be set to the value it has been assigned on init, like so:
p = PersonStandardPython(first_name="John", last_name="Doe")
print(p.composite_name)
p.first_name = "Jane"
print(p.composite_name)
Which should print:
John Doe
John Doe
So both case exhibit totally different behaviors with regards to mutability. If you want your Person
object to be mutable, your first example is entirely correct! You just have to look at it again and understand its behavior. So let me address the three points you mentioned:
The first thing printed is "intializing name_list" while the print statement of "initializing composite_name" comes first.
This is entirely expected. As the compute_field
works like a property, it executes the code defined in composite_name_list
first, before the other property composite_name
is accessed.
It seems to recalculate the composite name attribute every time it is called, even though I used the computed field decorator.
Again this is entirely expected. As it works just like a property
it re-executes the code defined in the method. However you can cache the result of the computed property (more on this later).
I added the last line "print(p.composite_name_list) because otherwise it wouldn't print anything at all! Or in other words, instantiating the class Person does not automatically seem to cause the creation of my two computed properties.
This also expected, because the code is only executed on access of the computed field. It is "delayed" and not computed on initialization of the object.
Alternatively with Pydantic you can achieve "faux immutability" (see faux immutability docs). This way you can compute the derived attributes on init or before and prevent that the attributes it is based off are modified later. For this you can use frozen=True
in the class definition and for example a model_validator
:
from pydantic import BaseModel, model_validator
from typing import List, Optional
class Person(BaseModel, frozen=True):
first_name: str
last_name: str
composite_name: Optional[str] = None
composite_name_list: Optional[List[str]] = None
@model_validator(mode="before")
@classmethod
def init_derived_attribute(cls, data, info):
first_name = data.get("first_name")
last_name = data.get("last_name")
composite_name = f"{first_name} {last_name}"
data["composite_name"] = composite_name
data["composite_name_list"] = [f"{composite_name} {i}" for i in range(5)]
return data
p = Person(first_name="John", last_name="Doe")
print(p.composite_name)
p.first_name = "Jane" # this now raises an error!
While the example above works fine, I think it is not the cleanest solution. You mention you would mostly like to avoid the re-computation of the field. The solution for this is simple. You can use a cached_property
from the standard functools
library. However in this case you should still combine it with faux immutability to make sure the object cannot be modified in memory and the derived property goes out of sync. Here is the final code I would propose:
from pydantic import BaseModel, computed_field
from typing import List
from functools import cached_property
class Person(BaseModel, frozen=True):
first_name: str
last_name: str
@computed_field
@cached_property
def composite_name(self) -> str:
print("initializing composite_name")
return f"{self.first_name} {self.last_name}"
@computed_field
@cached_property
def composite_name_list(self) -> List[str]:
print("initializing name_list")
return [f"{self.composite_name} {i}" for i in range(5)]
p = Person(first_name="John", last_name="Doe")
print(p.composite_name_list)
Which prints:
initializing name_list
initializing composite_name
['John Doe 0', 'John Doe 1', 'John Doe 2', 'John Doe 3', 'John Doe 4']
While it keeps the execution order the same (see above, this is expected), it avoids the re-computation of composite_name
and only prints it once. For a all subsequent access it is cached. One important note here is that typically it is only reasonable to use cache_property
if the computation is rather "expensive". If you really just concatenate two string, doing it repeatedly might just be fine.
If you intend your Person
class to be mutable your first proposed solution is just fine! The re-computation ensures, that the derived fields / properties are always "up to date" with the other attributes it is derived of. Alternatively you can change to one of the solutions with "faux immutability" I proposed above, which both avoid the re-computation.