pythonfb-hydraomegaconf

Default (nested) dataclass initialization in hydra when no arguments are provided


I have the following code, using the hydra framework

# dummy_hydra.py

from dataclasses import dataclass

import hydra
from hydra.core.config_store import ConfigStore
from omegaconf import DictConfig, OmegaConf


@dataclass
class Foo:
    x: int = 0
    y: int = 1


@dataclass
class Bar:
    a: int = 0
    b: int = 1


@dataclass
class FooBar:
    foo: Foo
    bar: Bar


cs = ConfigStore.instance()
cs.store(name="config_schema", node=FooBar)


@hydra.main(config_name="dummy_config", config_path=".", version_base=None)
def main(config: DictConfig):
    config_obj: FooBar = OmegaConf.to_object(config)
    print(config_obj)


if __name__ == '__main__':
    main()

(This is a simplified code of my actual use case, of course)

As you can see, I have a nested dataclass - the FooBar class contains instances of Foo and Bar. Both Foo and Bar have default attribute values. Hence, I thought I can define a yaml file that does not necessarily initializes Foo and/or Bar. Here's the file I use:

# dummy_config.yaml
defaults:
  - config_schema
  - _self_

foo:
  x: 123
  y: 456

When I run this code, surprisingly (?) it does not initialize Bar (which is not mentioned in the yaml config file), but throws an error:

omegaconf.errors.MissingMandatoryValue: Structured config of type `FooBar` has missing mandatory value: bar
    full_key: bar
    object_type=FooBar

What's the proper way to use this class structure such that I don't need to explicitly initialize classes with non-mandatory fields (such as Bar)?


Solution

  • Uninitialized values in dataclasses are considered missing. This semantic is unique to OmegaConf (the underlying config library powering Hydra) and accessing those fields will result in the MissingMandatoryValue exception when you access the field. You can use OmegaConf.is_missing(cfg, "bar") to determine if the field is missing without triggering the exception.

    In pure YAML config, you can achieve this behavior by using the value ??? in your config file. In Structured Configs (dataclasses) you can achieve it explicitly by assigning OmegaConf.MISSING to a field.

    It is not clear from your question what you want in the bar field. If it's None, you can convert change the signature of your dataclass to something like:

    @dataclass
    class FooBar:
        foo: Optional[Foo] = None
        bar: Optional[Bar] = None
    

    If you want to have foo and bar initialized to their default values, this just assign Foo() and Bar() respectively. I saw in another comment that you are concerned that the instance will be shared. This is not the case. The config is converted to OmegaConf DictConfig in any case before you convert it to an object. Try and see.

    
    @dataclass
    class Foo:
        x: int = 0
        y: int = 1
    
    
    @dataclass
    class Bar:
        a: int = 0
        b: int = 1
        f: Foo = Foo()
    
    
    @dataclass
    class FooBar:
        foo: Foo = Foo()
        bar1: Bar = Bar()
        bar2: Bar = Bar()
    
    
    cs = ConfigStore.instance()
    cs.store(name="config_schema", node=FooBar)
    
    
    @hydra.main(config_name="dummy_config", config_path=".", version_base=None)
    def main(config: DictConfig):
        config_obj: FooBar = OmegaConf.to_object(config)
        config_obj.foo.x = 100
        config_obj.bar1.f.x = 200
        config_obj.bar2.f.x = 300
        print(config_obj)
        # FooBar(foo=Foo(x=100, y=456), bar1=Bar(a=0, b=1, f=Foo(x=200, y=1)), bar2=Bar(a=0, b=1, f=Foo(x=300, y=1)))