python ggplot2 visualization altair plotnine

Building plots with plotnine and Python

I am looking for a way that I can modify plots by adding to an existing plot object. For example, I want to add annotations at particular dates in a work plot, but want a standard way of building the base chart, and then "adding" annotations if I decide to later.

Example of what I want to do (using Altair)

It might be easier to show what I want with altair as an example:

import altair as alt
import plotnine as plt

data = pd.DataFrame([
    {'x': 0, 'y': 0},
    {'x': 2, 'y': 1},
    {'x': 3, 'y': 4}
])

events = pd.DataFrame([
    {'x': 1, 'y': 3, 'label': 'the one'}
])

base = alt.Chart(data).mark_line().encode(
    x='x',
    y='y'
) 

annotate = (
  alt.Chart(events).mark_text().encode(
    x='x',
    y='y',
    text='label'
  ) + alt.Chart(events).mark_rule().encode(
    x='x', color=alt.value('red')
  )
)

# display
base + annotate

makes what I want.

I can also make functions

def make_base_plot(data):
    return alt.Chart(data).mark_line().encode(x='x', y='y')

def make_annotations(events):
    return (
      alt.Chart(events).mark_text().encode(
        x='x',
        y='y',
        text='label'
      ) + alt.Chart(events).mark_rule().encode(
        x='x', color=alt.value('red')
      )
    )

which enables me to plot the base data without annotations, or provide the annotation message later, or edit the events for the audience the plot is intended for.

Same example using plotnine

If I want to do this "all at once" here is how I would create this plot:

import altair as alt
import plotnine as plt

data = pd.DataFrame([
    {'x': 0, 'y': 0},
    {'x': 2, 'y': 1},
    {'x': 3, 'y': 4}
])

events = pd.DataFrame([
    {'x': 1, 'y': 3, 'label': 'the one'}
])

(
    p9.ggplot(data, p9.aes(x='x', y='y'))
    + p9.geom_line(color='blue')
    + p9.theme_bw()
    + p9.geom_text(mapping=p9.aes(x='x', y='y', label='label'), data=events)
    + p9.geom_vline(mapping=p9.aes(xintercept='x'), data=events, color='red')
)

However, the two "natural" attempts to decompose this fail:

...

# This part is fine
base = (
    p9.ggplot(data, p9.aes(x='x', y='y'))
    + p9.geom_line(color='blue')
    + p9.theme_bw()
)
# So is this
annotations = (
 p9.ggplot(data, p9.aes(x='x', y='y'))
    + p9.geom_text(mapping=p9.aes(x='x', y='y', label='label'), data=events)
    + p9.geom_vline(mapping=p9.aes(xintercept='x'), data=events, color='red')
)
# This fails
base + annotations

# Error message
# AttributeError: 'ggplot' object has no attribute '__radd__'

Trying this without annotations having a p9.ggplot object to start fails when I try to create the annotations object.

My question is, how do I decompose the grammar of graphics in plotnine so I can have functions create common components that I can compose, similar to Altair?

I know an alternative is to create a function that has two inputs (data and events) and do this in one pass, but that means when creating a template for a graph I have to anticipate all future annotations I want to make, if I want to build from a template of graphs.

Solution

Similar to R's ggplot2 (see here) you can use a list to decompose the creation of a plot in multiple parts where each part consists of multiple components or layers:

import plotnine as p9
import pandas as pd

data = pd.DataFrame([
    {'x': 0, 'y': 0},
    {'x': 2, 'y': 1},
    {'x': 3, 'y': 4}
])

events = pd.DataFrame([
    {'x': 1, 'y': 3, 'label': 'the one'}
])

base = (
    p9.ggplot(data, p9.aes(x='x', y='y'))
    + p9.geom_line(color='blue')
    + p9.theme_bw()
)

annotations = [
    p9.geom_text(mapping=p9.aes(x='x', y='y', label='label'), data=events),
    p9.geom_vline(mapping=p9.aes(xintercept='x'), data=events, color='red')
]

base + annotations

Hence, you can rewrite your altair functions using plotnine like so:

def make_base_plot(data, color = 'blue'):
    return (
        p9.ggplot(data, p9.aes(x='x', y='y'))
        + p9.geom_line(color=color)
        + p9.theme_bw()
    )

def make_annotations(events, color = 'red'):
    return [
        p9.geom_text(mapping=p9.aes(x='x', y='y', label='label'), data=events),
        p9.geom_vline(mapping=p9.aes(xintercept='x'), data=events, color=color)
    ]

make_base_plot(data, 'red') +\
    make_annotations(events, 'blue')