Part 1: We are trying to make a ggplot()
that shows multiple different summary statistics depending on the x-axis. We have more or less managed to do this, but are open to suggestions to help with efficiency.
Time
0: have error bars bounded from 6 g to 20 gTime
1 to 8: mean +/- 1 standard deviation (SD)Time
9 to 12: mean +/- 2 SDTime
>13: no error barsPart 2: The data we are actually working on has Times
up to 3000. Because of this, we would like to include a inset (subplot
) in the upper left hand side of the plot for Time>=6, Time<=10
.
Part 3: In addition to both things above, we want to then remove any data points that are outside the error bars (so that we can show a "before" (with all the points, including those outside the bounds of the error bars) and "after" plot (only points inside the bounds of the error bars for up to and including Time==12
)).
For the purposes of reproducibility, I am using a dataset from R to illustrate my question. The dataset:
library(datasets)
data(ChickWeight) #importing data from base R
summary(ChickWeight)
weight Time Chick Diet
Min. : 35.0 Min. : 0.00 13 : 12 1:220
1st Qu.: 63.0 1st Qu.: 4.00 9 : 12 2:120
Median :103.0 Median :10.00 20 : 12 3:120
Mean :121.8 Mean :10.72 10 : 12 4:118
3rd Qu.:163.8 3rd Qu.:16.00 17 : 12
Max. :373.0 Max. :21.00 19 : 12
(Other):506
I have been able to do some of this by first creating vectors (one for ymin
and ymax
in geom_errorbar
). The code to do this is at the bottom of the question. We are open to suggestions on how to do this more efficiently.
We then try to put all this together in one ggplot()
like so (excluding unneeded formatting):
#Import required package:
library(ggplot2)
ggplot(merge_stats, aes(y = weight, x = as.numeric(Time))) +
geom_jitter(color="grey", width=0.1)+
geom_errorbar(aes(ymin=SDbelow, ymax=SDabove), width=0.1, size=1)+
stat_summary(
geom = "point",
fun.y = "mean",
col = "blue",
size = 2,
shape = 19,
fill = "blue")
This generates:
How do we add a subplot()
to this ggplot()
? AND How do we then go on to delete the points outside the standard deviation parameters set above?
So, the final product should be two figures, one plot and subplot with all the data points and another plot and subplot without the deleted points.
Code to make the different standard deviations for each age group. Note, we are open to suggestions about streamlining this.
#loading required package
library(dplyr)
library(pracma)
#Creates a table that includes the SD of each age and the mean of each age
merge_stats <- ChickWeight %>%
arrange(Time) %>%
group_by(Time) %>%
mutate(MEAN=mean(weight), SD = sd(weight), SDt=2*sd(weight)) #add to data
#Vector for Time==0:
merge_stats_age_zero <- merge_stats %>%
filter(Time==0)
vl <- length(merge_stats_age_zero$weight)
MSZUL=linspace(20, 20, vl) #Vector for top bound
MSZLL=linspace(6, 6, vl) #Vector for bottom bound
#Vector for Time>=1, Time<=8:
mergesaot <- merge_stats %>%
filter(Time>=1, Time<=8)
#vectors for +/- 1 SD for Time>=1, Time<=8:
otoerr = mergesaot$MEAN+mergesaot$SD
otberr = mergesaot$MEAN-mergesaot$SD
#Vector for Time>8, Time<=12:
mergesef <- merge_stats %>%
filter(Time>8) %>%
filter(Time<=12)
#vectors for +/- 2 SD for Time>8, Time<=12:
efoerr <- mergesef$MEAN+mergesef$SDt
efberr <- mergesef$MEAN-mergesef$SDt
#Combining vectors together:
LSDabove <- c(MSZUL ,otoerr, efoerr)
LSDbelow <- c(MSZLL ,otberr, efberr)
#To generate the final vector we need to first find its length. This is done by subtracting the length of the total by the three added together.
m_swt <- c(merge_stats$SD)
finpeice <- length(m_swt) - length(LSDabove)
#Knowing the length we will generate a vector of zeros to represent no error bars and to cover the remaining length of our errorbar vectors
finpeiceVec <- linspace(0, 0, finpeice)
#Finaly we have generated our two vectors to represent our error bars
SDabove <- c(MSZUL ,otoerr, efoerr, finpeiceVec)
SDbelow <- c(MSZLL ,otberr, efberr, finpeiceVec)
This is really two questions. The one about removing points outside the error bars really just comes down to filtering your data after the summary stats are created on the main data set. If you struggle with this then a dedicated question might be better. I will show here how to inset a subplot using grid
, on which ggplot2
is built:
subset1 <- which(merge_stats$Time >= 6 & merge_stats$Time <= 10)
p1 <- ggplot(merge_stats[subset1, ],
aes(y = weight, x = as.numeric(Time))) +
geom_jitter(color="grey", width=0.1)+
geom_errorbar(aes(ymin=SDbelow[subset1], ymax=SDabove[subset1]), width=0.1, size=1)+
stat_summary(
geom = "point",
fun.y = "mean",
col = "blue",
size = 2,
shape = 19,
fill = "blue")
inset <- ggplotGrob(p1)
ggplot(merge_stats, aes(y = weight, x = as.numeric(Time))) +
geom_jitter(color="grey", width=0.1)+
geom_errorbar(aes(ymin=SDbelow, ymax=SDabove), width=0.1, size=1)+
stat_summary(
geom = "point",
fun.y = "mean",
col = "blue",
size = 2,
shape = 19,
fill = "blue")
vp <- grid::viewport(width = 0.4, height = 0.4, x = 0.3, y = 0.7)
print(p1, vp = vp)