pythonyamlruamel.yaml

How to configure ruamel.yaml.dump output?


With this data structure:

d = {
    (2,3,4): {
        'a': [1,2], 
        'b': 'Hello World!',
        'c': 'Voilà!'
    }
}

I would like to get this YAML:

%YAML 1.2
---
[2,3,4]:
  a:
    - 1
    - 2
  b: Hello World!
  c: 'Voilà!'

Unfortunately I get this format:

$ print ruamel.yaml.dump(d, default_flow_style=False, line_break=1, explicit_start=True, version=(1,2))
%YAML 1.2
---
? !!python/tuple
- 2
- 3
- 4
: a:
  - 1
  - 2
  b: Hello World!
  c: !!python/str 'Voilà!'

I cannot configure the output I want even with safe_dump. How can I do that without manual regex work on the output?

The only ugly solution I found is something like:

def rep(x):
    return repr([int(y) for y in re.findall('^\??\s*-\s*(\d+)', x.group(0), re.M)]) + ":\n"
print re.sub('\?(\s*-\s*(\w+))+\s*:', rep, 
    ruamel.yaml.dump(d, default_flow_style=False, line_break=1, explicit_start=True, version=(1,2)))

Solution

  • New ruamel.yaml API

    You cannot get what you want using ruamel.yaml.dump(), but with the new API, which has a few more controls, you can come very close.

    import sys
    import ruamel.yaml
    
    
    d = {
        (2,3,4): {
            'a': [1,2], 
            'b': 'Hello World!',
            'c': 'Voilà!'
        }
    }
    
    def prep(d):
        if isinstance(d, dict):
            needs_restocking = False
            for idx, k in enumerate(d):
                if isinstance(k, tuple):
                    needs_restocking = True
                try:
                    if 'à' in d[k]:
                        d[k] = ruamel.yaml.scalarstring.SingleQuotedScalarString(d[k])
                except TypeError:
                    pass
                prep(d[k])
            if not needs_restocking:
                return
            items = list(d.items())
            for (k, v) in items:
                d.pop(k)
            for (k, v) in items:
                if isinstance(k, tuple):
                    k = ruamel.yaml.comments.CommentedKeySeq(k)
                d[k] = v
        elif isinstance(d, list):
            for item in d:
                prep(item)
    
    yaml = ruamel.yaml.YAML()
    yaml.indent(mapping=2, sequence=4, offset=2)
    yaml.version = (1, 2)
    data = prep(d)
    yaml.dump(d, sys.stdout)
    

    which gives:

    %YAML 1.2
    ---
    [2, 3, 4]:
      a:
        - 1
        - 2
      b: Hello World!
      c: 'Voilà!'
    

    There is still no simple way to suppress the space before the sequence items, so you cannot get [2,3,4] instead of [2, 3, 4] without some major effort.

    Original answer:


    You cannot get exactly what you want as output using ruamel.yaml.dump() without major rework of the internals.

    Less difficult issues, with indication of solution:

    If you do:

    import sys
    import ruamel.yaml
    from ruamel.yaml.comments import CommentedMap, CommentedKeySeq
    assert ruamel.yaml.version_info >= (0, 12, 14)
    
    data = CommentedMap()
    data[CommentedKeySeq((2, 3, 4))] = cm = CommentedMap()
    cm['a'] = [1, 2]
    cm['b'] = 'Hello World!'
    cm['c'] = ruamel.yaml.scalarstring.SingleQuotedScalarString('Voilà!')
    
    ruamel.yaml.round_trip_dump(data, sys.stdout, explicit_start=True, version=(1, 2))
    

    you will get:

    %YAML 1.2
    ---
    [2, 3, 4]:
      a:
      - 1
      - 2
      b: Hello World!
      c: 'Voilà!'
    

    which, apart from the now consistent indentation level of 2, the extra spaces in the flow style sequence, and the required use of the round_trip_dump, will get you as close to what you want without major rework.

    Whether the above code is ugly as well or not is of course a matter of taste.

    The output will, non-incidently, round-trip correctly when loaded using ruamel.yaml.round_trip_load(preserve_quotes=True).


    If control over the quotes is not needed, and neither is the order of your mapping keys important, then you can also patch the normal dumper:

    def my_key_repr(self, data):
        if isinstance(data, tuple):
            print('data', data)
            return self.represent_sequence(u'tag:yaml.org,2002:seq', data,
                                           flow_style=True)
        return ruamel.yaml.representer.SafeRepresenter.represent_key(self, data)
    
    ruamel.yaml.representer.Representer.represent_key = my_key_repr
    

    Then you can use a normal sequence:

    data = {}
    data[(2, 3, 4)] = cm = {}
    cm['a'] = [1, 2]
    cm['b'] = 'Hello World!'
    cm['c'] = 'Voilà!'
    
    ruamel.yaml.dump(data, sys.stdout, allow_unicode=True, explicit_start=True, version=(1, 2))
    

    will give you:

    %YAML 1.2
    ---
    [2, 3, 4]:
      a: [1, 2]
      b: Hello World!
      c: Voilà!
    

    please note that you need to explicitly allow unicode in your output (default with round_trip_dump()) using allow_unicode=True.


    ¹ Disclaimer: I am the author of ruamel.yaml.