Imagine a complex config that specifies many ML models, each with some number of layers. Something like
@dataclass
class Layer:
width: int
activation: str
...
@dataclass
class Model:
layers: List[Layer]
@dataclass
class Forest:
models: Dict[str, Model]
Then, I have a yaml file as my main config where I define a forest of models. Then, on command line, I can override individual fields of some layer with something like models.cnn.layers.1.width=10
. I would like to have a directory of various layers and be able to do something like models.cnn.layers.1=wide_cnn
on command line, where wide_cnn.yaml
is some yaml file in that directory.
I tried various things like using wide_cnn.yaml
, using absolute path, using models/cnn/layers.1
, etc. It seems like the core issue is that to have hydra load a file, the text before the =
on command line (call it a "key") must be a config group. I don't quite understand what makes something a config group, but it seems like it corresponds to a directory structure. In my case, the key can be basically anything because users can name their models arbitrarily.
Any suggestions how to do something like this? Thanks!
Hydra comes pretty close, but list composition is not supported. In general, this is pushing Hydra to its limits, and I am not sure it can be done conveniently.
You can "assign a config file" only if it's a part of a config group. You can also "assign a list of config files" to use to compose a dictionary, You can specify a top-level model config file that includes multiple config files from the same config group. (Note: this will create a dictionary of layers, not a list of layers).
Something like this:
model/
model1.yaml
model2.yaml
layers/
layer1.yaml
layer2.yaml
layer3.yaml
model1.yaml
defaults:
- layers:
- layer1
- layer3
model2.yaml
defaults:
- layers:
- layer2
- layer3
Note that layer files needs to specify a sub node otherwise they will all write be merged into the same node. You can achieve it by changing their structure or overriding the package in the defaults list).