pythonobjectserializationnestedpython-dataclasses

Creating nested dataclass objects in Python


I have a dataclass object that has nested dataclass objects in it. However, when I create the main object, the nested objects turn into a dictionary:

@dataclass
class One:
    f_one: int
    f_two: str
    
@dataclass
class Two:
    f_three: str
    f_four: One


Two(**{'f_three': 'three', 'f_four': {'f_one': 1, 'f_two': 'two'}})

Two(f_three='three', f_four={'f_one': 1, 'f_two': 'two'})

obj = {'f_three': 'three', 'f_four': One(**{'f_one': 1, 'f_two': 'two'})}

Two(**obj)
Two(f_three='three', f_four=One(f_one=1, f_two='two'))

As you can see only **obj works.

Ideally I'd like to construct my object to get something like this:

Two(f_three='three', f_four=One(f_one=1, f_two='two'))

Is there any way to achieve that other than manually converting nested dictionaries to corresponding dataclass object, whenever accessing object attributes?


Solution

  • This is a request that is as complex as the dataclasses module itself, which means that probably the best way to achieve this "nested fields" capability is to define a new decorator, akin to @dataclass.

    Fortunately, if you don't need the signature of the __init__ method to reflect the fields and their defaults, like the classes rendered by calling dataclass, this can be a whole lot simpler: A class decorator that will call the original dataclass and wrap some functionality over its generated __init__ method can do it with a plain "...(*args, **kwargs):" style function.

    In other words, all one needs to do is write a wrapper around the generated __init__ method that will inspect the parameters passed in "kwargs", check if any corresponds to a "dataclass field type", and if so, generate the nested object prior to calling the original __init__. Maybe this is harder to spell out in English than in Python:

    from dataclasses import dataclass, is_dataclass
    
    def nested_dataclass(*args, **kwargs):
        def wrapper(cls):
            cls = dataclass(cls, **kwargs)
            original_init = cls.__init__
            def __init__(self, *args, **kwargs):
                for name, value in kwargs.items():
                    field_type = cls.__annotations__.get(name, None)
                    if is_dataclass(field_type) and isinstance(value, dict):
                         new_obj = field_type(**value)
                         kwargs[name] = new_obj
                original_init(self, *args, **kwargs)
            cls.__init__ = __init__
            return cls
        return wrapper(args[0]) if args else wrapper
    

    Note that besides not worrying about __init__ signature, this also ignores passing init=False - since it would be meaningless anyway.

    (The if in the return line is responsible for this to work either being called with named parameters or directly as a decorator, like dataclass itself)

    And on the interactive prompt:

    In [85]: @dataclass
        ...: class A:
        ...:     b: int = 0
        ...:     c: str = ""
        ...:         
    
    In [86]: @dataclass
        ...: class A:
        ...:     one: int = 0
        ...:     two: str = ""
        ...:     
        ...:         
    
    In [87]: @nested_dataclass
        ...: class B:
        ...:     three: A
        ...:     four: str
        ...:     
    
    In [88]: @nested_dataclass
        ...: class C:
        ...:     five: B
        ...:     six: str
        ...:     
        ...:     
    
    In [89]: obj = C(five={"three":{"one": 23, "two":"narf"}, "four": "zort"}, six="fnord")
    
    In [90]: obj.five.three.two
    Out[90]: 'narf'
    

    If you want the signature to be kept, I'd recommend using the private helper functions in the dataclasses module itself, to create a new __init__.