pythonpython-dataclassespython-datamodel

best way to create serializable data model


somewhat inexperienced in python. I am coming from C# world, and I am trying to figure out what is the best practice to create a data structure in python that:

  1. can have empty fields (None)
  2. can have default values assigned to some fields
  3. can have aliases assigned to fields that would be used during serialization

to clarify: for example in C# I can do something like this:

using Newtonsoft.Json;
public class MyDataClass
{
    [JsonProperty(PropertyName = "Data Label")]
    public string label { get; set; }

    [JsonProperty(PropertyName = "Data Value")]
    public string value { get; set; }

    [JsonProperty(PropertyName = "Data Description")]
    public MyDataDefinition definiton { get; set; }

    public MyDataClass
    {
        this.label = "Default Label";
    }
}

with this class, i can create instance with only one field pre-populated, and populate the rest of the data structure at will, and then serialize it to JSON with aliased field names as decorated.

In python, i experimented with several packages, but every time i end up with super complex implementation that doesn't hit all of the requirements. I MUST be missing something very fundamental, because it seems like such a simple and common use case.

how would you implement something like this in most "pythonic" way?


Solution

  • Hard to say what is "best practice", I personally would say that just working with dictionaries is very common unless you have a good reason to define a class instead (limitation: no default values, no aliases). If that works for you depends on how you intend to use the data. If you have a dictionary, serializing it is just

    import json
    
    record = { 
        "Data Label": "Default Label",
        "Data Value": None, 
        "Data Description": {
            "f1": 1, 
            "f2":"2"
        }
    }
    with open("my_record.json", "w") as f:
        json.dump(record, f)
    

    And reading it would be

    import json
    
    with open("my_record.json", "r") as f:
        record = json.load(f)
    

    If you're coming from a language that follows OO principles to a T it might look weird to handle data without a class to serve as an interface. But often enough it's just fine.

    If it turns out that it isn't, and you really want to have some kind of schema that tells you what your data looks like / helps your IDE to figure out auto-completion, you can add a TypedDict definition to the existing code (new limitation: only valid python variable names can be keys):

    from typing import TypedDict, cast
    import json
    
    class MyDataContainer(TypedDict):
        label: str
        value: str | None
        definiton: "MyDataDefinition"
    
    class MyDataDefinition(TypedDict):
        f1: int
        f2: str
    
    with open("my_record.json", "r") as f:
        record = cast(MyDataContainer, json.load(f))
    
    record[  # at this point your IDE should hint "label", "value", or "definiton"
    

    Note: The cast doesn't do anything, it just asserts the type to tooling like your IDE. If you want actual run-time checks against the data you're loading, you need to install third-party libraries like typeguard. First, ask yourself though - is there actually value to this, or are you performing this merely to "do things the right way"?


    If you can't work with the limitations of dictionaries that I outlined, I'd recommend to go with pydantic. It supports serializing to / deseralizing from json, aliases, defaults, and many many more things:

    import pydantic
    
    class MyDataContainer(pydantic.BaseModel):
        label: str = pydantic.Field("Default Label", alias="Data Label")
        value: str | None = pydantic.Field(alias="Data Value")
        definiton: "MyDataDefinition" = pydantic.Field(alias="Data Description")
    
    class MyDataDefinition(pydantic.BaseModel):
        f1: int
        f2: str
    
    with open("my_record.json", "r") as f:
        record = MyDataContainer(**json.load(f))
        # pydantic would have complained if the json didn't comply
    
    print(record.label)  # prints: "Default Label"
    print(record.model_dump_json(indent=2))
    # prints:
    # {
    #   "Data Label": "Default Label", 
    #   "Data Value": null, 
    #   "Data Description": { 
    #     "f1": 1, 
    #     "f2": "2" 
    #   } 
    # }