fb-hydraomegaconf

Schema validation in Hydra not working when configuration path is parent folder


I have the following project setup:

configs/
├── default.yaml
└── trainings
    ├── data_config
    │   └── default.yaml
    ├── simple.yaml
    └── schema.yaml

The content of the files are as follows:

app.py:

from dataclasses import dataclass
from enum import Enum
from pathlib import Path

from omegaconf import MISSING, DictConfig, OmegaConf

import hydra
from hydra.core.config_store import ConfigStore

CONFIGS_DIR_PATH = Path(__file__).parent / "configs"
TRAININGS_DIR_PATH = CONFIGS_DIR_PATH / "trainings"


class Sampling(Enum):
    UPSAMPLING = 1
    DOWNSAMPLING = 2


@dataclass
class DataConfig:
    sampling: Sampling = MISSING


@dataclass
class TrainerConfig:
    project_name: str = MISSING
    data_config: DataConfig = MISSING


# @hydra.main(version_base="1.2", config_path=CONFIGS_DIR_PATH, config_name="default")
@hydra.main(version_base="1.2", config_path=TRAININGS_DIR_PATH, config_name="simple")
def run(configuration: DictConfig):
    sampling = OmegaConf.to_container(cfg=configuration, resolve=True)["data_config"]["sampling"]
    print(f"{sampling} Type: {type(sampling)}")


def register_schemas():
    config_store = ConfigStore.instance()
    config_store.store(name="base_schema", node=TrainerConfig)


if __name__ == "__main__":
    register_schemas()
    run()

configs/default.yaml:

defaults:
  - /trainings@: simple
  - _self_
project_name: test

configs/trainings/simple.yaml:

defaults:
  - base_schema
  - data_config: default
  - _self_

project_name: test

configs/trainings/schema.yaml:

defaults:
  - data_config: default
  - _self_

project_name: test

configs/trainings/data_config/default.yaml:

defaults:
  - _self_
sampling: DOWNSAMPLING

Now, when I run app.py as shown above, I get the expected result (namely, "DOWNSAMPLING" gets resolved to an enum type). However, when I try to run the application where it constructs the configuration from the default.yaml in the parent directory then I get this error:

So, when the code is like so:

...
@hydra.main(version_base="1.2", config_path=CONFIGS_DIR_PATH, config_name="default")
# @hydra.main(version_base="1.2", config_path=TRAININGS_DIR_PATH, config_name="simple")
def run(configuration: DictConfig):
...

I get the error below:

In 'trainings/simple': Could not load 'trainings/base_schema'.

Config search path:
        provider=hydra, path=pkg://hydra.conf
        provider=main, path=file:///data/code/demos/hydra/configs
        provider=schema, path=structured://

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I do not understand why specifying the schema to be used is causing this issue. Would someone have an idea why and what could be done to fix the problem?


Solution

  • If you are using default lists in more than one config file I strongly suggest that you fully read andf understand The Defaults List page. Configs addressed in the defaults list are relative to the config group of the containing config. The error is telling you that Hydra is looking for base_schema in trainings, because the defaults list that loads base_schema is in trainings.

    Either put base_schema inside trainings when you register it:

    config_store.store(group="trainings", name="base_schema", node=TrainerConfig)
    

    Or use absolute addressing in the defaults list when addressing it (e.g. in configs/trainings/simple.yaml):

    defaults:
      - /base_schema
      - data_config: default
      - _self_