pythoninheritancepydanticabstract-factory

Use Pydantic child model to manage sets of default values for the parent model


I am using pydantic to manage settings for an app that supports different datasets. Each has a set of overridable defaults, but they are different per datasets. Currently, I have all of the logic correctly implemented via validators:

from pydantic import BaseModel

class DatasetSettings(BaseModel):
    dataset_name: str 
    table_name: str

    @validator("table_name", always=True)
    def validate_table_name(cls, v, values):
        if isinstance(v, str):
            return v
        if values["dataset_name"] == "DATASET_1":
            return "special_dataset_1_default_table"
        if values["dataset_name"] == "DATASET_2":
            return "special_dataset_2_default_table"
        return "default_table"

class AppSettings(BaseModel):
    dataset_settings: DatasetSettings
    app_url: str

This way, I get different defaults based on dataset_name, but the user can override them if necessary. This is the desired behavior. The trouble is that once there are more than a handful of such fields and names, it gets to be a mess to read and to maintain. It seems like inheritance/polymorphism would solve this problem but the pydantic factory logic seems too hardcoded to make it feasible, especially with nested models.

class Dataset1Settings(DatasetSettings):
    dataset_name: str = "DATASET_1"
    table_name: str = "special_dataset_1_default_table"

class Dataset2Settings(DatasetSettings):
    dataset_name: str = "DATASET_2"
    table_name: str = "special_dataset_2_default_table"

def dataset_settings_factory(dataset_name, table_name=None):
    if dataset_name == "DATASET_1":
        return Dataset1Settings(dataset_name, table_name)
    if dataset_name == "DATASET_2":
        return Dataset2Settings(dataset_name, table_name)
    return DatasetSettings(dataset_name, table_name)

class AppSettings(BaseModel):
    dataset_settings: DatasetSettings
    app_url: str

Options I've considered:

I was hoping Field(default_factory=dataset_settings_factory) would work, but the default_factory is only for actual defaults so it has zero args. Is there some other way to intercept the args of a particular pydantic field and use a custom factory?


Solution

  • I ended up solving the problem following the first option, as follows. Code is runnable with pydantic 1.8.2 and pydantic 1.9.1.

    from typing import Optional
    from pydantic import BaseModel, Field
    
    
    class DatasetSettings(BaseModel):
        dataset_name: Optional[str] = Field(default="DATASET_1")
        table_name: Optional[str] = None
    
        def __init__(self, **data):
            factory_dict = {"DATASET_1": Dataset1Settings, "DATASET_2": Dataset2Settings}
            dataset_name = (
                data["dataset_name"]
                if "dataset_name" in data
                else self.__fields__["dataset_name"].default
            )
            if dataset_name in factory_dict:
                data = factory_dict[dataset_name](**data).dict()
            super().__init__(**data)
    
    
    class Dataset1Settings(BaseModel):
        dataset_name: str = "DATASET_1"
        table_name: str = "special_dataset_1_default_table"
    
    
    class Dataset2Settings(BaseModel):
        dataset_name: str = "DATASET_2"
        table_name: str = "special_dataset_2_default_table"
    
    
    class AppSettings(BaseModel):
        dataset_settings: DatasetSettings = Field(default_factory=DatasetSettings)
        app_url: Optional[str]
    
    
    app_settings = AppSettings(dataset_settings={"dataset_name": "DATASET_1"})
    assert app_settings.dataset_settings.table_name == "special_dataset_1_default_table"
    app_settings = AppSettings(dataset_settings={"dataset_name": "DATASET_2"})
    assert app_settings.dataset_settings.table_name == "special_dataset_2_default_table"
    
    # bonus: no args mode
    app_settings = AppSettings()
    assert app_settings.dataset_settings.table_name == "special_dataset_1_default_table"
    

    A couple of gotchas I discovered along the way:

    1. If Dataset1Settings inherits from DatasetSettings, it enters a recursive loop calling init on init ad infinitum. This could be broken with some introspection, but I opted for the duck approach.
    2. The current solution destroys any validators on DatasetSettings. I'm sure there's a way to call the validation logic anyway but the current solution effectively sidesteps whatever class-level validation you have by only initing with super().__init__
    3. The same thing works for BaseSettings objects, but you have to drag their cumbersome init args:
        def __init__(
            self,
            _env_file: Union[Path, str, None] = None,
            _env_file_encoding: Optional[str] = None,
            _secrets_dir: Union[Path, str, None] = None,
            **values: Any
        ):
            ...