pytorchfb-hydraomegaconf

How to reload hydra config with enumerations


Is there a better way to reload a hydra config from an experiment with enumerations? Right now I reload it like so:

initialize_config_dir(config_dir=exp_dir, ".hydra"), job_name=config_name)
cfg = compose(config_name, overrides=overrides)
print(cfg.enum)
>>> ENUM1

But ENUM1 is actually an enumeration that normally loads as

>>> <SomeEnumClass.ENUM1: 'enum1'>

I am able to fix this by adding a configstore default to the experiment hydra file:

defaults:
  - base_config_cs

Which now results in

initialize_config_dir(config_dir=exp_dir, ".hydra"), job_name=config_name)
cfg = compose(config_name, overrides=overrides)
print(cfg.enum)
>>> <SomeEnumClass.ENUM1: 'enum1'>

Is there a better way to do this without adding this? Or can I add the default in the python code?


Solution

  • This is a good question -- reliably reloading configs from previous Hydra runs is an area that could be improved. As you've discovered, loading the saved file config.yaml directly results in an untyped DictConfig object.

    The solution below involves a script called reload.py that creates a config node with a defaults list that loads both the schema base_config_cs and the saved file config.yaml.

    At the end of this post I also give a simple solution that involves loading .hydra/overrides.yaml to re-run the config composition process.


    Suppose you've run a Hydra job with the following setup:

    # app.py
    from dataclasses import dataclass
    from enum import Enum
    import hydra
    from hydra.core.config_store import ConfigStore
    from omegaconf import DictConfig
    
    class SomeEnumClass(Enum):
        ENUM1 = 1
        ENUM2 = 2
    
    @dataclass
    class Schema:
        enum: SomeEnumClass
        x: int = 123
        y: str = "abc"
    
    def store_schema() -> None:
        cs = ConfigStore.instance()
        cs.store(name="base_config_cs", node=Schema)
    
    @hydra.main(config_path=".", config_name="foo")
    def app(cfg: DictConfig) -> None:
        print(cfg)
    
    if __name__ == "__main__":
        store_schema()
        app()
    
    # foo.yaml
    defaults:
      - base_config_cs
      - _self_
    enum: ENUM1
    x: 456
    
    $ python app.py y=xyz
    {'enum': <SomeEnumClass.ENUM1: 1>, 'x': 456, 'y': 'xyz'}
    

    After running app.py, there exists a directory outputs/2022-02-05/06-42-42/.hydra containing the saved file config.yaml.

    As you correctly pointed out in your question, to reload the saved config you must merge the schema base_config_cs with the contents of config.yaml. Here is a pattern for accomplishing that:

    # reload.py
    import os
    from hydra import compose, initialize_config_dir
    from hydra.core.config_store import ConfigStore
    from app import store_schema
    
    config_name = "config"
    exp_dir = os.path.abspath("outputs/2022-02-05/07-19-56")
    saved_cfg_dir = os.path.join(exp_dir, ".hydra")
    assert os.path.exists(f"{saved_cfg_dir}/{config_name}.yaml")
    
    store_schema()  # stores `base_config_cs`
    cs = ConfigStore.instance()
    cs.store(
        name="reload_conf",
        node={
            "defaults": [
                "base_config_cs",
                config_name,
            ]
        },
    )
    
    with initialize_config_dir(config_dir=saved_cfg_dir):
        cfg = compose("reload_conf")
    print(cfg)
    
    $ python reload.py
    {'enum': <SomeEnumClass.ENUM1: 1>, 'x': 456, 'y': 'xyz'}
    

    In the above, python file reload.py, we store a node called reload_conf in the ConfigStore. Storing reload_conf this way is equivalent to creating a file called reload_conf.yaml that is discoverable by Hydra on the config search path. This reload_conf node has a defaults list that loads both the schema base_config_cs and config. For this to work, the following two conditions must be met:

    Note that in foo.yaml we have a defaults list ["base_config_cs", "_self_"] that loads the schema base_config_cs before loading the contents _self_ of foo. In order for reload_conf to reconstruct the app's config with the same merge order, base_config_cs should come before config_name in the defaults list belonging to reload_conf.


    The above approach could be taken one step further by removing the defaults list from foo.yaml and using cs.store to ensure the same defaults list is used in both the app and the reloading script

    # app2.py
    from dataclasses import dataclass
    from enum import Enum
    from typing import Any, List
    import hydra
    from hydra.core.config_store import ConfigStore
    from omegaconf import MISSING, DictConfig
    
    class SomeEnumClass(Enum):
        ENUM1 = 1
        ENUM2 = 2
    
    @dataclass
    class RootConfig:
        defaults: List[Any] = MISSING
        enum: SomeEnumClass = MISSING
        x: int = 123
        y: str = "abc"
    
    def store_root_config(primary_config_name: str) -> None:
        cs = ConfigStore.instance()
        # defaults list defined here:
        cs.store(
            name="root_config", node=RootConfig(defaults=["_self_", primary_config_name])
        )
    
    @hydra.main(config_path=".", config_name="root_config")
    def app(cfg: DictConfig) -> None:
        print(cfg)
    
    if __name__ == "__main__":
        store_root_config("foo2")
        app()
    
    # foo2.yaml (note NO DEFAULTS LIST)
    enum: ENUM1
    x: 456
    
    $ python app2.py hydra.job.chdir=false y=xyz
    {'enum': <SomeEnumClass.ENUM1: 1>, 'x': 456, 'y': 'xyz'}
    
    # reload2.py
    import os
    from hydra import compose, initialize_config_dir
    from hydra.core.config_store import ConfigStore
    from app2 import store_root_config
    
    config_name = "config"
    exp_dir = os.path.abspath("outputs/2022-02-05/07-45-43")
    saved_cfg_dir = os.path.join(exp_dir, ".hydra")
    assert os.path.exists(f"{saved_cfg_dir}/{config_name}.yaml")
    
    store_root_config("config")
    with initialize_config_dir(config_dir=saved_cfg_dir):
        cfg = compose("root_config")
    print(cfg)
    
    $ python reload2.py
    {'enum': <SomeEnumClass.ENUM1: 1>, 'x': 456, 'y': 'xyz'}
    

    A simpler alternative approach is to use .hydra/overrides.yaml to recompose the app's configuration based on the overrides that were originally passed to Hydra:

    # reload3.py
    import os
    import yaml
    from hydra import compose, initialize
    from app import store_schema
    
    config_name = "config"
    exp_dir = os.path.abspath("outputs/2022-02-05/07-19-56")
    saved_cfg_dir = os.path.join(exp_dir, ".hydra")
    overrides_path = f"{saved_cfg_dir}/overrides.yaml"
    assert os.path.exists(overrides_path)
    
    overrides = yaml.unsafe_load(open(overrides_path, "r"))
    print(f"{overrides=}")
    store_schema()
    with initialize(config_path="."):
        cfg = compose("foo", overrides=overrides)
    print(cfg)
    
    $ python reload3.py
    overrides=['y=xyz']
    {'enum': <SomeEnumClass.ENUM1: 1>, 'x': 456, 'y': 'xyz'}
    

    This approach has its drawbacks: if your app's configuration involves some non-hermetic operation like querying a timestamp (e.g. via Hydra's now resolver) or looking up an environment variable (e.g. via the oc.env resolver), the configuration composed by reload.py might be different from the original version loaded in app.py.