I have a Pydantic model with a field of type AnyUrl
.
When exporting the model to YAML, the AnyUrl
is serialized as individual field slots, instead of a single string URL (perhaps due to how the AnyUrl.__repr__
method is implemented).
For example:
from pydantic import BaseModel, AnyUrl
import yaml
class MyModel(BaseModel):
url: AnyUrl
data = {'url': 'https://www.example.com'}
model = MyModel.parse_obj(data)
y = yaml.dump(model.dict(), indent=4)
print(y)
Produces:
url: !!python/object/new:pydantic.networks.AnyUrl
args:
- https://www.example.com
state: !!python/tuple
- null
- fragment: null
host: www.example.com
host_type: domain
password: null
path: null
port: null
query: null
scheme: https
tld: com
user: null
Ideally, I would like the serialized YAML to contain https://www.example.com
instead of individual fields.
I have tried to override the __repr__
method of AnyUrl
to return the AnyUrl
object itself, as it extends the str
class, but no luck.
Unfortunately, the pyyaml
documentation is just horrendous, so seemingly elemental things like customizing (de-)serialization are a pain to figure out properly. But there are essentially two ways you could solve this.
YAMLObject
You had the right right idea of subclassing AnyUrl
, but the __repr__
method is irrelevant for YAML serialization. For that you need to do three things:
YAMLObject
,yaml_tag
, andto_yaml
classmethod.Then pyyaml
will serialize this custom class (that inherits from both AnyUrl
and YAMLObject
) in accordance with what you define in to_yaml
.
The to_yaml
method always receives exactly two arguments:
yaml.Dumper
instance with built-in capabilities to serialize standard types (via methods like represent_str
for example) andTo avoid adding/overriding additional methods, you can leverage the fact that AnyUrl
inherits from string and the underlying str.__new__
method actually receives the full URL during construction. Therefore the str.__str__
method will return that "as is".
from pydantic import AnyUrl, BaseModel
from yaml import Dumper, ScalarNode, YAMLObject, dump, safe_load
class Url(AnyUrl, YAMLObject):
yaml_tag = "!Url"
@classmethod
def to_yaml(cls, dumper: Dumper, data: str) -> ScalarNode:
return dumper.represent_str(str.__str__(data))
class MyModel(BaseModel):
foo: int = 0
url: Url
obj = MyModel.parse_obj({"url": "https://www.example.com"})
print(obj)
serialized = dump(obj.dict()).strip()
print(serialized)
deserialized = MyModel.parse_obj(safe_load(serialized))
print(deserialized == obj and isinstance(deserialized.url, Url))
Output:
foo=0 url=Url('https://www.example.com', scheme='https', host='www.example.com', tld='com', host_type='domain')
foo: 0
url: https://www.example.com
True
AnyUrl
You can avoid defining your own subclass and instead globally register a function that defines how instances of AnyUrl
should be serialized, by using the yaml.add_representer
function.
That function takes two mandatory arguments:
The representer function essentially has to have the same signature as the YAMLObject.to_yaml
classmethod presented in option A, i.e. it takes a Dumper
instance and the data to be serialized as arguments.
from pydantic import AnyUrl, BaseModel
from yaml import Dumper, ScalarNode, add_representer, dump, safe_load
def url_representer(dumper: Dumper, data: AnyUrl) -> ScalarNode:
return dumper.represent_str(str.__str__(data))
add_representer(AnyUrl, url_representer)
class MyModel(BaseModel):
foo: int = 0
url: AnyUrl
obj = MyModel.parse_obj({"url": "https://www.example.com"})
print(obj)
serialized = dump(obj.dict()).strip()
print(serialized)
deserialized = MyModel.parse_obj(safe_load(serialized))
print(deserialized == obj and isinstance(deserialized.url, AnyUrl))
Output is the same as with the code from option A.
The benefit of this approach is that it involves less code and potential namespace collisions between the two parent classes in option A.
A potential drawback is that it modifies a global setting for the entire runtime of the program, which can become less transparent, if your application becomes large and is just something to be aware of, in case you decide you want to serialize AnyUrl
objects differently at some point.