python-3.xruamel.yaml

Best way to use ruamel.yaml to dump YAML to string (NOT to stream)


In the past, I did something like some_fancy_printing_loggin_func(yaml.dump(...), ...), using the backward-compatible part of ruamel.yaml, but I want to convert my code to use the latest API so that I can take advantage of some of the new formatting settings.

However, I hate that I have to specify a stream to ruamel.yaml.YAML.dump() ... I don't want it to write directly to a stream; I just want it to return the output to the caller. What am I missing?

PS: I know I can do something like the following, though of course I'm trying to avoid it.

f = io.StringIO()
yml.dump(myobj, f)
f.seek(0)
my_logging_func(f.read())

Solution

  • I am not sure if you really are missing something, if at all it might be that if you're working with streams you should—preferably—continue to work with streams. That is however something many users of ruamel.yaml and PyYAML seem to miss and therefore they do:

    print(dump(data))
    

    instead of

    dump(data, sys.stdout)
    

    The former might be fine for non-realistic data used in the (PyYAML) documentation, but it leads to bad habits for real data.

    The best solution is to make your my_logging_func() stream oriented. This can e.g. be done as follows:

    import sys
    import ruamel.yaml
    
    data = dict(user='rsaw', question=47614862)
    
    class MyLogger:
        def write(self, s):
            sys.stdout.write(s.decode('utf-8'))
    
    my_logging_func = MyLogger()
    yml = ruamel.yaml.YAML()
    yml.dump(data, my_logging_func)
    

    which gives:

    user: rsaw
    question: 47614862
    

    but note that MyLogger.write() gets called multiple times (in this case eight times), and if you need to work on a line at a time, you have to do line buffering.

    If you really need to process your YAML as bytes or str, you can install the appropriate plugin (ruamel.yaml.bytes resp. ruamel.yaml.string ) and do:

    yaml = ruamel.yaml.YAML(typ=['rt', 'string'])
    data  = dict(abc=42, help=['on', 'its', 'way'])
    print('retval', yaml.dump_to_string(data))
    

    Or process the result of yaml.dump_to_string(data), its equivalent yaml.dumps(data) as you see necessary. Replacing string with bytes in the above doesn't decode the UTF-8 stream back to str but keeps it as bytes.