I am trying to write a hierarchical configuration structure such that config files in the inner directories inherit from the config files in the outer directories. For example, in the following scenario
upper_config
|
|-middle_config
| |
| |-lower_config
I would like middle_config
to be able to inherit & override the parameters of upper_config
, and lower_config
to be able to inherit & override the parameters of both middle_config
and upper_config
.
One solution would be to write a configuration parser such that outer modules are read first, and as inner modules are read they overwrite the fields in the outer modules.
However, I would like to use Hydra (or some other tool, open to suggestions) for all of the added conveniences. I've read the documentation front to back a few times, and though it feels like either config groups or packages directives should be able to handle this, I can't quite piece it together.
I believe this post asks a very similar question, but the answer hasn't enlightened me, and it seems that the person who asked the question decided to implement a version of the config parser I described above.
I am hoping that there is a way for an inner config file's package
directive to be changed to point to a parent configuration and somehow inherit its default list that way.
Suppose we have the following files:
my_app.py
outer/conf1.yaml
outer/middle/conf2.yaml
outer/middle/inner/conf3.yaml
To make things concrete, here are the contents of my_app.py
:
import hydra, omegaconf
@hydra.main(config_path="outer", config_name="conf1")
def my_app(cfg) -> None:
print(omegaconf.OmegaConf.to_yaml(cfg))
my_app()
If your yaml
files just contain plain data (i.e. no defaults lists or package directives), the most flexible approach to dynamically composing your config at the command line looks like this:
$ python my_app.py +middle@_global_=conf2 +middle/inner@_global_=conf3
This will merge outer/middle/conf2.yaml
on top of outer/conf1.yaml
, then merge outer/middle/inner/conf3.yaml
on top of that. The @_global_
keyword means that the input configs should be merged at the top level instead of being nested according to the names of their containing directories.
In answering this question, I use some features from the recent release for Hydra 1.1:
>>> import hydra
>>> hydra.__version__
'1.1.0.rc1'
There are a few approaches we could take to overriding our outer configuration with middle/inner configuration:
Here are the details for each approach:
Suppose we have the following:
In outer/conf1.yaml
:
defaults:
- _self_
- middle@_here_: conf2
a: 1
b: 2
In outer/middle/conf2.yaml
:
defaults:
- _self_
- inner@_here_: conf3
b: 3
c: 4
In outer/middle/inner/conf3.yaml
:
c: 5
d: 6
With these yaml files, running my_app.py
gives the following result:
$ python my_app.py
a: 1
b: 3
c: 5
d: 6
As you can see, conf1
is being overridden by conf2
, which is in turn being
overridden by conf3
. So, how does this work? The defaults list is used to specify the order in which each configuration object
is composed. In conf1
, the @_here_
package keyword is used to specify that
the conf2
should be merged info the current config group instead of being
included in the middle
package. This is documented in Default List package
keywords.
Also of interest is the @_global_
keyword. Note that one could just-as-well
write - middle@foo: conf2
instead of - middle@_here_: conf2
in the defaults list, in which case
a "foo"
key would appear in the output config with the contents of conf2
nested under it.
Just as in conf1.yaml
, conf2.yaml
is using the defaults list to specify
that conf3
should be merged into conf2
instead of being merged into a
package named "inner"
(which would have been the default behavior, as is
documented
here).
What is the - _self_
keyword doing?
In a defaults list, this keyword allows for control of the order in which the
current config is merged with other input configs specified in the defaults
list. For example, in the conf2.yaml
defaults list, writing - _self_
before - inner@_here_: conf3
ensures that conf3
will be merged into
conf2
, and not the other way around. This _self_
keyword is documented
here. If - _self_
is not specified in the
defaults list, then the order in which the defaults are merged with the current
config is:
For reference, see these migration instructions for moving from version 1.0 to 1.1.
Using a package directive at the top of a yaml file can achieve a similar result:
In outer/conf1.yaml
:
defaults:
- _self_
- middle: conf2
a: 1
b: 2
In outer/middle/conf2.yaml
:
# @package _global_
defaults:
- _self_
- inner: conf3
b: 3
c: 4
In outer/middle/inner/conf3.yaml
# @package _global_
c: 5
d: 6
The # @package <PACKAGE>
directive specifies where the contents of the
current input config should be placed.
$ python my_app.py
a: 1
b: 3
c: 5
d: 6
This works much the same way as using an @<PACKAGE>
keyword in the defaults
list (as detailed in the previous section), and the result at the command-line is
identical. One difference between these two approaches is that a package header
applies to all contents of the given input config, whereas using an
@<PACKAGE>
keyword in the defaults list gives more granular
control over which input configs should be placed into which packages.
Using the - _self_
keyword in the defaults list is still necessary to ensure
that the merge happens in the correct order (see the previous section for notes
on _self_
).
Hydra's treatment of package headers is different in Hydra 1.0 vs 1.1.
The most elegant and flexible way to achieve the desired result is using a command-line package override:
Given outer/conf1.yaml
as follows:
a: 1
b: 2
And outer/middle/conf2.yaml
thus:
b: 3
c: 4
and outer/middle/inner/conf3.yaml
:
c: 5
d: 6
We can use Hydra's powerful command-line override syntax to compose the output config:
$ python my_app.py +middle@_global_=conf2 +middle/inner@_global_=conf3
a: 1
b: 3
c: 5
d: 6
Using the _self_
keyword is not necessary with this approach because the
+<group>@<package>=<option>
has the effect of appending to the defaults
list (here is a
reference) as opposed to prepending.