I have a (very) multivariate time series, say 25 observations at different timepoints, for at least 15 different variables. I would like to plot all observations with respect to time and plot a mean-line and confidence interval for each timepoint (where mean and standard deviation is across the different variables for each time point).
I find that I can do these tasks separately:
nr <- 25
nc <- 15
dat <- data.frame(matrix(data=rnorm(nr*nc), nrow=nr, ncol = nc))
year <- seq(2000, 2000+3*(nr-1), by = 3)
dat$year <- year
dat_long <- melt(data=dat, id.vars = "year")
ggplot(dat_long, aes(x=year, y = value, col = variable))
and
sd <- sqrt(unname(apply(dat, 1, var)))
mean <- unname(apply(dat, 1, mean))
upper <- mean + 2*sd
lower <- mean - 2*sd
eb <- aes(ymin=lower, ymax=upper)
ggplot(dat, aes(x=year, y=mean)) + geom_line() + geom_ribbon(eb, alpha=0.5)
but if I add in col=variable to try to get the individual lines for each variable, I no longer get the confidence interval. What might I try instead?
Create another data.frame with the year and computed statistics and pass it to geom_ribbon
in the data
argument.
Note that you were also computing the standard error and mean of year
, in the code below it is not. And that apply/mean
is replaced by rowMeans
.
library(ggplot2)
set.seed(2023)
nr <- 25
nc <- 15
dat <- data.frame(matrix(data=rnorm(nr*nc), nrow=nr, ncol = nc))
year <- seq(2000, 2000+3*(nr-1), by = 3)
dat$year <- year
dat_long <- reshape2::melt(data=dat, id.vars = "year")
sd <- apply(dat[-(nc + 1)], 1, sd)
mean <- rowMeans(dat[-(nc + 1)])
upper <- mean + 2*sd
lower <- mean - 2*sd
dat_stats <- data.frame(year, mean, lower, upper)
ggplot(dat_long, aes(x=year, y = value, color = variable)) +
geom_line() +
geom_ribbon(
data = dat_stats,
aes(x = year, ymin = lower, ymax = upper),
alpha = 0.25,
inherit.aes = FALSE
) +
theme_bw()
Created on 2023-09-14 with reprex v2.0.2