I want to use Python to read and write YAML frontmatter in markdown files. I have come across the ruamel.yaml package but am having trouble understanding how to use it for this purpose.
If I have a markdown file:
---
car:
make: Toyota
model: Camry
---
# My Ultimate Car Review
This is a good car.
For one, is there a way to set the yaml data to variables in my python code?
Second, is there a way to set new values to the yaml in the markdown file?
For the first, I have tried:
from ruamel.yaml import YAML
import sys
f = open("cars.txt", "r+") # I'm really not sure if r+ is ideal here.
yaml = YAML()
code = yaml.load(f)
print(code['car']['make'])
but get an error:
ruamel.yaml.composer.ComposerError: expected a single document in the stream
in "cars.txt", line 2, column 1
but found another document
in "cars.txt", line 5, column 1
For the second, I have tried:
from ruamel.yaml import YAML
import sys
f = open("cars.txt", "r+") # I'm really not sure if r+ is ideal here.
yaml = YAML()
code = yaml.load(f)
code['car']['model'] = 'Sequoia'
but get the same error error:
ruamel.yaml.composer.ComposerError: expected a single document in the stream
in "cars.txt", line 2, column 1
but found another document
in "cars.txt", line 5, column 1
When you have multiple YAML documents in one file these are separated with a line consisting of
three dashes, or starting with three dashes followed by a space.
Most YAML parsers, including ruamel.yaml
either expect a single document file (when using YAML().load()
)
or a multi-document file (when using YAML().load_all()
).
The method .load()
returns the single data structure, and complains if there seems to be more than one
document (i.e. when it encounters the second ---
in your file). The
.load_all()
method can handle one or more YAML documents, but always returns
an iterator.
Your input happens to be a valid multi-document YAML file but the markdown part often makes this not be the case. It easily could
always have been valid YAML by just changing the second ---
into --- |
thereby making the
markdown part a (multi-line) literal scalar string. I have no idea why the
designers of such YAML frontmatter formats didn't specify that, it might have to
do that some parsers (like PyYAML) fail to parse such non-indented literal scalar
strings at the root level correctly, although examples of those are in the YAML
specification.
In your example the markdown part is so simple that it is valid YAML without
having to specify the |
for literal scalar string. So you could use
.load_all()
on this input. But just adding e.g. a line
starting with a dash to the markdown section, will result in an invalid YAML
document, so you if you use .load_all()
, you have to make sure you
do not iterate so far as to parse the second document:
import sys
from pathlib import Path
import ruamel.yaml
path = Path('cars.txt')
yaml = ruamel.yaml.YAML()
for data in yaml.load_all(path):
break
print(data['car']['make'])
which gives:
Toyota
You shouldn't try to update the file however (so don't use r+
), as your YAML frontmatter might be
longer than the original and and updating would overwrite your markdown. For
updating, read file into memory, split into two parts based on the second line
of dashes, update the data, dump it and append the dashes and markdown:
import sys
from pathlib import Path
import ruamel.yaml
path = Path('cars.txt')
opath = Path('cars_out.txt')
yaml_str, markdown = path.read_text().lstrip().split('\n---', 1)
yaml_str += '\n' # re-add the trailing newline that was split off
yaml = ruamel.yaml.YAML()
yaml.explicit_start = True
data = yaml.load(yaml_str)
data['car']['year'] = 2003
with opath.open('w') as fp:
yaml.dump(data, fp)
fp.write('---')
fp.write(markdown)
sys.stdout.write(opath.read_text())
which gives:
---
car:
make: Toyota
model: Camry
year: 2003
---
# My Ultimate Car Review
This is a good car.