I cannot find how to do the equivalent of ggplot's interaction
feature.
This site has an example that has the sort of thing I am looking to reproduce: I have two discrete factors (in the example, store and promotion), and I would like to produce a line plot with each unique combination making a "group" to connect the lines with.
Taking it from the example above:
#create line plot with values grouped by store and promo
ggplot(df, aes(x=week, y=sales, color=store, shape=promo,
group=interaction(store, promo))) +
geom_point(size=3) +
geom_line()
plotnine
import plotnine as p9
import pandas as pd
df = pd.DataFrame({
'week': [1,2,3,4]*4,
'store': (["A"]*8 + ["B"]*8),
'promo': (["promo1"]*4 + ["promo2"]*4)*2,
'sales': [1, 2, 6, 7, 2, 3, 5, 6, 3, 4, 7, 8, 3, 5, 8, 9]
})
#create line plot with values grouped by store and promo
(
p9.ggplot(df, p9.aes(x="week", y="sales", color="store", shape="promo",
group="interaction(store, promo)")) +
p9.geom_point(size=3) +
p9.geom_line()
)
This breaks because it doesn't know what interaction is. If I want the plot, I have to create a column in the data solely for defininte groups:
# ... as before
df['group_col'] = df.apply(lambda row: (row.store, row.promo), axis=1)
(
p9.ggplot(df, p9.aes(x="week", y="sales", color="store", shape="promo",
group="group_col")) +
p9.geom_point(size=3) +
p9.geom_line()
)
This isn't great, because it means I am manually creating (and the tidying columns) for every plot grouping that makes sense, rather than the R-style approach where the interaction happens inside p9 but doesn't alter the actual data frame.
interaction
commentAn answer below stated I wanted to use interaction
which was a R feature, and wasn't in Python (and thus wasn't part of a plotnine question). That wasn't what was being asked for; in the expectation section I was expecting to use interaction in a similar way I use factor.
In more detail: The expectation part didn't expect to use interaction
as a function, it was part of string interpolation:
.... group="interaction(store, promo)"))
The python interpreter never sees a call to interaction
, it only sees a string. This is similar to how factor works in plotnine -- factor is a base R function, but we can use
.... color="factor(numeric_id)"
in a plotnine geom, and it will interpret the numeric_id
as a discrete variable, rather than a continuous one.
As week
is a numeric adding a group
aes isn't necessary. For this reason I slightly modified the example data by making week
a character
so that we end up with an example where both ggplot2
and plotnine
require to explicitly map on the group
aes to get a line .
Based on the modified example a possible workaround or an alternative to the use of interaction
which I use quite often in ggplot2
is to concatenate the columns, i.e. in R I would do group = paste(store, promo, sep = ".")
and in plotnine
we can do group="store + '.' + promo"
(Actually adding a separator is not required but I made it a habit to add it to resemble the output of interaction
).
import plotnine as p9
import pandas as pd
df = pd.DataFrame({
'week': ['1', '2', '3', '4']*4,
'store': (["A"]*8 + ["B"]*8),
'promo': (["promo1"]*4 + ["promo2"]*4)*2,
'sales': [1, 2, 6, 7, 2, 3, 5, 6, 3, 4, 7, 8, 3, 5, 8, 9]
})
#create line plot with values grouped by store and promo
(
p9.ggplot(df, p9.aes(x="week", y="sales", color="store", shape="promo",
group="store + '.' + promo")) +
p9.geom_point(size=3) +
p9.geom_line()
)