I want to create a histogram for my data table dt
grouped by acquiYear
, where the y-axis represents the nrOrders
and the x-axis the month
. My data table looks like this:
structure(list(acquiYear = c("2014", "2014", "2014", "2014", "2014", "2014",
"2014", "2014", "2014", "2014", "2014", "2014", "2015", "2015",
"2015", "2015", "2015", "2015", "2015", "2015", "2015", "2015",
"2015", "2015", "2016", "2016", "2016", "2016", "2016", "2016",
"2016", "2016", "2016", "2016", "2016", "2016", "2017", "2017",
"2017", "2017", "2017", "2017", "2017", "2017", "2017", "2017",
"2017", "2017", "2018", "2018", "2018", "2018", "2018", "2018",
"2018", "2018", "2018", "2018", "2018", "2018"), month = structure(c(1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), .Label = c("Jan",
"Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct",
"Nov", "Dec"), class = "factor"), nrOrders = c(0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 2, 0, 2, 4, 5, 3, 7, 3, 5, 4, 3, 7, 8, 7, 2, 24,
16, 33, 9, 27, 16, 10, 27, 9, 31, 35, 11, 11, 25, 15, 18, 19,
19, 8, 27, 34, 43, 51, 0, 11, 2, 0, 0, 0, 0, 0, 4, 5, 1, 0),
), row.names = c(NA, -60L), class = c("data.table",
"data.frame"))
I need for each month
per acquiYear
a bar and for each acquiYear
over the months a desity line.
The colors for year should be c("#00943C", "#4A52A0", "#FDC300", "#6F6F6F", "#EC4C24")
.
How can I fix this?
The problem is that what you are describing is not a histogram. A histogram is a way to show the distribution of a single continuous variable. Typically, the range of this variable is shown along the x axis, and the axis is split into fixed-width bins. A bar is constructed for each bin where the height of the bar on the y axis shows the count or proportion of observations that lie within that bin.
What you have is observations of three variables: the month, the year and the number of orders. You wish to show the number of orders on the y axis as a function of month, and also display the year as a grouping variable. It therefore appears that you are looking for a dodged bar chart. Perhaps something like this:
ggplot(df, aes(month, nrOrders, fill = acquiYear)) +
geom_col(position = 'dodge') +
xlab("Month") +
ylab("Nr. of Orders") +
ggtitle(paste("Delivery year 2018")) +
theme_classic() +
theme(plot.title = element_text(face = "bold", hjust = 0.5)) +
theme(axis.title = element_text(face = "bold")) +
scale_fill_manual(values = c("#00943C", "#4A52A0", "#FDC300",
"#6F6F6F", "#EC4C24"))
Similarly, adding a density curve for each year doesn't make any sense here. A density curve shows the density of measurements of a single variable over a continuous range (a bit like a smoothed histogram), whereas you have equally-spaced measurements that are already fully described by the bars.
You could add a smooth curve for each of the years, but the plot is already complex and the curves would not add any information; in fact, they would obscure the data that your plot already shows:
ggplot(df, aes(as.numeric(month), nrOrders, fill = acquiYear)) +
geom_col(position = 'dodge') +
ggalt::stat_xspline(geom = 'area', spline_shape = -0.4, alpha = 0.3) +
xlab("Month") +
ylab("Nr. of Orders") +
ggtitle(paste("Delivery year 2018")) +
theme_classic() +
theme(plot.title = element_text(face = "bold", hjust = 0.5)) +
theme(axis.title = element_text(face = "bold")) +
scale_fill_manual(values = c("#00943C", "#4A52A0", "#FDC300",
"#6F6F6F", "#EC4C24")) +
scale_x_continuous(breaks = 1:12, labels = month.abb)
If you really want to do this, you may find that faceting gives a clearer picture:
ggplot(df, aes(as.numeric(month), nrOrders, fill = acquiYear)) +
geom_col(position = 'dodge', width = 0.5) +
ggalt::stat_xspline(geom = 'area', spline_shape = -0.4, alpha = 0.5) +
xlab("Month") +
ylab("Nr. of Orders") +
ggtitle(paste("Delivery year 2018")) +
facet_wrap(.~acquiYear, ncol = 1) +
theme_classic() +
theme(plot.title = element_text(face = "bold", hjust = 0.5)) +
theme(axis.title = element_text(face = "bold")) +
scale_fill_manual(values = c("#00943C", "#4A52A0", "#FDC300",
"#6F6F6F", "#EC4C24")) +
scale_x_continuous(breaks = 1:12, labels = month.abb)