pythonpython-3.xmongodbpydantic

What is the new way to declare Mongo ObjectId with PyDantic v2.0^?


This week, I started working with MongoDB and Flask, so I found a helpful article on how to use them together by using PyDantic library to define MongoDB's models. However, the article is somewhat outdated, mostly could be updated to new PyDantic's version, but the problem is that the ObjectId is a third party field and that changed drastically between versions.

The article defines the ObjectId using the following code:

from bson import ObjectId
from pydantic.json import ENCODERS_BY_TYPE


class PydanticObjectId(ObjectId):
    """
    Object Id field. Compatible with Pydantic.
    """

    @classmethod
    def __get_validators__(cls):
        yield cls.validate
   
    #The validator is doing nothing
    @classmethod
    def validate(cls, v):
        return PydanticObjectId(v)

    #Here you modify the schema to tell it that it will work as an string
    @classmethod
    def __modify_schema__(cls, field_schema: dict):
        field_schema.update(
            type="string",
            examples=["5eb7cf5a86d9755df3a6c593", "5eb7cfb05e32e07750a1756a"],
        )

#Here you encode the ObjectId as a string
ENCODERS_BY_TYPE[PydanticObjectId] = str

In the past, this code worked well. However, I recently discovered that the latest version of PyDantic has a more complex way of defining custom data types. I've tried following the Pydantic documentation, but I'm still confused and haven't been able to implement it successfully.

I've tried the implementation to do the implementation for third party types, but it's not working. It's almost the same code of the documentation, but changing ints for strings, and the third party callabels for ObjectId. Again, I'm not sure why it's not working.

from bson import ObjectId
from pydantic_core import core_schema 
from typing import Annotated, Any
from pydantic import BaseModel, GetJsonSchemaHandler, ValidationError

from pydantic.json_schema import JsonSchemaValue


class PydanticObjectId(ObjectId):
    """
    Object Id field. Compatible with Pydantic.
    """

    x: str

    def __init__(self):
        self.x = ''

class _ObjectIdPydanticAnnotation:
    @classmethod
    def __get_pydantic_core_schema__(
            cls,
            _source_type: Any,
            _handler: ObjectId[[Any], core_schema.CoreSchema],
        ) -> core_schema.CoreSchema:

        @classmethod
        def validate_object_id(cls, v: ObjectId) -> PydanticObjectId:
            if not ObjectId.is_valid(v):
                raise ValueError("Invalid objectid")
            return PydanticObjectId(v)
        
        from_str_schema = core_schema.chain_schema(
            [
                core_schema.str_schema(),
                core_schema.no_info_plain_validator_function(validate_object_id),
            ]
        )
        return core_schema.json_or_python_schema(
            json_schema=from_str_schema,
            python_schema=core_schema.union_schema(
                [
                    # check if it's an instance first before doing any further work
                    core_schema.is_instance_schema(PydanticObjectId),
                    from_str_schema,
                ]
            ),
            serialization=core_schema.plain_serializer_function_ser_schema(
                lambda instance: instance.x
            ),
        )
    @classmethod
    def __get_pydantic_json_schema__(
        cls, _core_schema: core_schema.CoreSchema, handler: GetJsonSchemaHandler
    ) -> JsonSchemaValue:
        # Use the same schema that would be used for `int`
        return handler(core_schema.int_schema())

I've searched for answers on StackOverflow, but all the answers I've found refer to older versions of Pydantic and use code that's similar to what I pasted above. If anyone knows of an alternative solution or can provide clear guidance on how to define a custom data type in the latest version of PyDantic, I would greatly appreciate it.


Update

A constant error that I'm getting because I'm not creating right the ObjectId type is this

Unable to generate pydantic-core schema for <class 'bson.objectid.ObjectId'>. Set arbitrary_types_allowed=True in the model_config to ignore this error or implement __get_pydantic_core_schema__ on your type to fully support it.

If you got this error by calling handler() within __get_pydantic_core_schema__ then you likely need to call handler.generate_schema(<some type>) since we do not call __get_pydantic_core_schema__ on <some type> otherwise to avoid infinite recursion.

For further information visit https://errors.pydantic.dev/2.0.2/u/schema-for-unknown-type

And the answer is to declare it as an unknown type, but I don't want it, I want to declare it as an ObjectId.


Solution

  • Generally best to ask questions like this on pydantic's GitHub discussions.

    Your solution is pretty close, I think you just have the wrong core schema.

    I think our documentation on using custom types via Annotated cover this fairly well, but just to help you, here is a working implementation:

    from typing import Annotated, Any
    
    from bson import ObjectId
    from pydantic_core import core_schema
    
    from pydantic import BaseModel
    
    from pydantic.json_schema import JsonSchemaValue
    
    
    class ObjectIdPydanticAnnotation:
        @classmethod
        def validate_object_id(cls, v: Any, handler) -> ObjectId:
            if isinstance(v, ObjectId):
                return v
    
            s = handler(v)
            if ObjectId.is_valid(s):
                return ObjectId(s)
            else:
                raise ValueError("Invalid ObjectId")
    
        @classmethod
        def __get_pydantic_core_schema__(cls, source_type, _handler) -> core_schema.CoreSchema:
            assert source_type is ObjectId
            return core_schema.no_info_wrap_validator_function(
                cls.validate_object_id, 
                core_schema.str_schema(), 
                serialization=core_schema.to_string_ser_schema(),
            )
    
        @classmethod
        def __get_pydantic_json_schema__(cls, _core_schema, handler) -> JsonSchemaValue:
            return handler(core_schema.str_schema())
    
    
    
    
    class Model(BaseModel):
        id: Annotated[ObjectId, ObjectIdPydanticAnnotation]
    
    
    print(Model(id='64b7abdecf2160b649ab6085'))
    print(Model(id='64b7abdecf2160b649ab6085').model_dump_json())
    print(Model(id=ObjectId()))
    print(Model.model_json_schema())
    print(Model(id='foobar'))  # will error