I have a number of yaml files I need to read in which contain lists of list. Here is a way to make some example data:
from time import time
import random
import yaml
# First make a list of lists
N = 2**17
lol = []
for _ in range(N):
lol.append([random.uniform(0, 2) for _ in range(10)])
# Write the list of lists to a yaml file
with open('data.yml', 'w') as outfile:
yaml.dump(lol, outfile, default_flow_style=True)
I want to read them in as quickly as possible. Pyyaml is unfortunately slow.
# Now time how long it takes to read it back in
t = time()
with open("data.yml", "r") as f:
lol = yaml.safe_load(f)
print(f"Reading took {round(time()-t, 2)} seconds")
This give over 60 seconds for me. The file is 27MB in size.
Is there a faster way to read in a yaml fie of exactly this format?
YAML is a superset of JSON, and coincidentally your data is also valid JSON (a list of lists of numbers).
Thus, using json.load()
seems to be the simplest way:
from time import perf_counter
import json
t = perf_counter()
with open("data.yml", "r") as f:
lol = json.load(f)
print(perf_counter() - t)
$ python read.py
0.44149563400003444