Pandoc has a filter that accepts Python snippets and uses (for example) Matplotlib to generate charts. I want to produce documents that produce many charts from a common data source (e.g. a pandas data frame).
As an example:
Here's the first chart:
~~~{.matplotlib}
import sqlite3
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
conn = sqlite3.connect('somedb.db')
query = '''SELECT something'''
df = pd.read_sql_query(query, conn).dropna()
fig, ax = plt.subplots()
ax.something()
~~~
The problem is that every chart has to regenerate the data frame, which is expensive. What I'd like to do is:
Any ideas?
The author of pandoc-plot kindly provided the following answer in Github:
Out-of-the-box there's no handling of your use-case in the pandoc-plot filter. Each code block that gets turned into a plot is intended to be independent from all others. This has many benefits, most importantly performance -- I wrote pandoc-plot
for book-sized workloads, with close to 100 figures.
The reason using preamble isn't working is because the preamble script gets copy-pasted into every code block before pandoc-plot renders a figure. Therefore, the creation of your dataframe will still be duplicated.
I would recommend you proceed with a script to wrap your usage of pandoc. For example (assuming you use bash):
# Run a script that goes through your expensive computation,
# storing the results as a CSV i
python create-data.py
# Render the document, where plots can reference the file created by
# your python script instead of re-creating the pandas dataframe for every plot
pandoc -f pandoc-plot ...
# Clean up temporary data file if you know where it is
You can communicate between the bash script above and your document plots using environment variables.