pythonruamel.yaml

Python ruamel.yaml - ensure quotes on specific keys


Following on from this question: JSON to YAML in Python : How to get correct string manipulation?

I'd like to have specific key names (regardless of nesting depth) always given a double-quote style. This is the starting point:

import sys
from ruamel.yaml import YAML

data = {
    'simple': 'nonquoted',
    'grouping': 'quoted',
    'deep': {
        'simple': 'nonquoted',
        'grouping': 'quoted',
    }
}

def test():
    yaml = YAML()
    yaml.default_flow_style = False
    yaml.dump(data, sys.stdout)

test()

which currently gives:

simple: nonquoted
grouping: quoted
deep:
  simple: nonquoted
  grouping: quoted

Id like it to be:

simple: nonquoted
grouping: "quoted"
deep:
  simple: nonquoted
  grouping: "quoted"

I've looked at adding representers, but I got stuck.

I have Python 3.10.13 and ruamel.yaml 0.18.6.

Update: I'm able to do this by walking the tree, applying the ruamel specific type DoubleQuotedScalarString

but maybe there is a way to do it on dump?

# Function to preprocess in-place the dictionary to quote values for specific keys
def preprocess_dict_in_place(data, doubleQuoteKeys={}):
    if isinstance(data, dict):
        for key in list(data.keys()):
            if key in doubleQuoteKeys:
                data[key] = DoubleQuotedScalarString(data[key])
            elif isinstance(data[key], dict):
                preprocess_dict_in_place(data[key], doubleQuoteKeys)
            elif isinstance(data[key], list):
                for index in range(len(data[key])):
                    preprocess_dict_in_place(data[key][index], doubleQuoteKeys)
    elif isinstance(data, list):
        for index in range(len(data)):
            preprocess_dict_in_place(data[index])

Solution

  • In my experience pre-processing, similar to what you do, is the way to go. Especially since you can do so in place because you only change values of a dict (instead of rebuilding a complete data structure).

    You can do this in a representer, but not by creating an alternative for .represent_dict(), as that just diverts the actual work to represent_mapping(), which you would have to monkey-patch (or subclass the RoundTripRepresenter class and add you represent_mapping() there. Beyond that method, in the output process, there is no knowledge whether a string being represented is a key or a value, let alone what the key for a specific values was while being in the process of outputting that value.

    Going that route would require you to attach the set/list of strings in doubleQuotedKeys to the RoundTripRepresenter instance, so it is accessible by represent_mapping(), which is the easy part (just create a non-used attribute). represent_mapping() however is a 90 line piece of code that you would have to copy and adjust.

    Alternatively you could duplicate and modify the data structure that represent_dict() hands to represent_mapping() as that method is smaller (14 lines). I recommend against modifying that data structure in place, as that would modify your program's data structure in a very obscure way.

    There are a few simplifications I would make to your code (without having run it):

    def preprocess_dict_in_place(data, doubleQuoteKeys=None):
        if doubleQuoteKeys is None:
            doubleQuote = set()      # no need to make a dict, as you don't use it values
        if isinstance(data, dict):
            for key, value in data.items():
                if key in doubleQuoteKeys and not isinstance(value, DoubleQuotedScalarString):
                    data[key] = DoubleQuotedScalarString(value)
                else:
                    preprocess_dict_in_place(value, doubleQuoteKeys)
        elif isinstance(data, list):
            for elem in data:
                preprocess_dict_in_place(elem)
    

    A key in YAML can be collection, and ruamel.yaml supports loading those (but as non-modifiable collections, because of the restrictions of Python), so in principle your code would have to handle that as well. But that complicates things considerably, and as you probably know whether your data structure contains no such keys, I would not add the code for that.

    I am also not 100% sure if DoubleQuotedScalarString() can handle a DoubleQuotedScalarString as argument hence the and not .... part.