pythonpandasdataframeggplot2plotnine

How to define groups in `plotnine` using interactions of features


I cannot find how to do the equivalent of ggplot's interaction feature.

What I'd like to do

This site has an example that has the sort of thing I am looking to reproduce: I have two discrete factors (in the example, store and promotion), and I would like to produce a line plot with each unique combination making a "group" to connect the lines with.

Example in ggplot

Taking it from the example above:

#create line plot with values grouped by store and promo
ggplot(df, aes(x=week, y=sales, color=store, shape=promo,
               group=interaction(store, promo))) + 
  geom_point(size=3) +
  geom_line()

What I'd expect to work in plotnine

import plotnine as p9
import pandas as pd

df = pd.DataFrame({
    'week': [1,2,3,4]*4,
    'store': (["A"]*8 + ["B"]*8),
    'promo': (["promo1"]*4 + ["promo2"]*4)*2,
    'sales': [1, 2, 6, 7, 2, 3, 5, 6, 3, 4, 7, 8, 3, 5, 8, 9]
})

#create line plot with values grouped by store and promo
(
  p9.ggplot(df, p9.aes(x="week", y="sales", color="store", shape="promo",
               group="interaction(store, promo)")) + 
  p9.geom_point(size=3) +
  p9.geom_line()
)

This breaks because it doesn't know what interaction is. If I want the plot, I have to create a column in the data solely for defininte groups:

# ... as before

df['group_col'] = df.apply(lambda row: (row.store, row.promo), axis=1)
(
  p9.ggplot(df, p9.aes(x="week", y="sales", color="store", shape="promo",
               group="group_col")) + 
  p9.geom_point(size=3) +
  p9.geom_line()
)

This isn't great, because it means I am manually creating (and the tidying columns) for every plot grouping that makes sense, rather than the R-style approach where the interaction happens inside p9 but doesn't alter the actual data frame.

Addendum for interaction comment

An answer below stated I wanted to use interaction which was a R feature, and wasn't in Python (and thus wasn't part of a plotnine question). That wasn't what was being asked for; in the expectation section I was expecting to use interaction in a similar way I use factor.

In more detail: The expectation part didn't expect to use interaction as a function, it was part of string interpolation:

       .... group="interaction(store, promo)"))

The python interpreter never sees a call to interaction, it only sees a string. This is similar to how factor works in plotnine -- factor is a base R function, but we can use

       .... color="factor(numeric_id)"

in a plotnine geom, and it will interpret the numeric_id as a discrete variable, rather than a continuous one.


Solution

  • As week is a numeric adding a group aes isn't necessary. For this reason I slightly modified the example data by making week a character so that we end up with an example where both ggplot2 and plotnine require to explicitly map on the group aes to get a line .

    Based on the modified example a possible workaround or an alternative to the use of interaction which I use quite often in ggplot2 is to concatenate the columns, i.e. in R I would do group = paste(store, promo, sep = ".") and in plotnine we can do group="store + '.' + promo" (Actually adding a separator is not required but I made it a habit to add it to resemble the output of interaction).

    import plotnine as p9
    import pandas as pd
    
    df = pd.DataFrame({
        'week': ['1', '2', '3', '4']*4,
        'store': (["A"]*8 + ["B"]*8),
        'promo': (["promo1"]*4 + ["promo2"]*4)*2,
        'sales': [1, 2, 6, 7, 2, 3, 5, 6, 3, 4, 7, 8, 3, 5, 8, 9]
    })
    
    #create line plot with values grouped by store and promo
    (
      p9.ggplot(df, p9.aes(x="week", y="sales", color="store", shape="promo",
                group="store + '.' + promo")) + 
      p9.geom_point(size=3) +
      p9.geom_line()
    )
    

    enter image description here