I have data from a questionnaire. It has 10 items, where one can score 0 or 1 in each item. Due to time pressure, many items, especially the last ones, were not answered, which is intended and counts as a score of 0. However, I want to preserve the NA
s for visualizing them.
My goal is a plot that shows the raw data points jittered and some overlaid mean + error bars per item. The NA
points should be plotted as well, at the side and in a different color, much like naniar::geom_miss_point()
does. I have almost achieved this by overlaying geom_miss_point()
and geom_jitter()
. See the plots below.
not important, just copy paste
library(ggplot2)
library(naniar)
set.seed(1)
# create weights for adding NAs later
# items have more NAs if their position is later
weights <- numeric()
for (i in 1:10) {
weights <- c(weights, rep(i, i))
}
s <- seq(0, 590, by = 10)
na <- s + sample(weights,
size = length(s),
replace = TRUE)
na2 <- s + sample(weights,
size = length(s),
replace = TRUE)
na3 <- unique(c(na, na2))
item <- rep(1:10, 60) |> as.factor()
score <- runif(600) |> round()
score[na3] <- NA
id <- rep(1:60, each = 10)
dat <- data.frame(id, item, score)
# compute a separate score where NA are counted as zero
dat$na_score <- dat$score
dat$na_score[is.na(dat$score)] <- 0
NA
shownusing geom_jitter()
ggplot(dat, aes(y = item, x = score)) +
geom_jitter(height = 0.2, width = 0.05, alpha = 0.3) +
stat_summary(fun.data = "mean_cl_normal",
geom = "errorbar",
aes(x = na_score)) +
stat_summary(fun = mean,
geom = "point",
color = "red",
aes(x = na_score)) +
labs(subtitle = "mean score (red) and 95% CI error bars\nmean + error bars count NA as zero")
NA
s shown, but jitter is only applied to NA
and not normal data points :(using geom_miss_point()
This is nice, because it shows how NA
s are dragging down the mean score.
ggplot(dat, aes(y = item, x = score)) +
geom_miss_point(alpha = 0.08) +
stat_summary(fun.data = "mean_cl_normal",
geom = "errorbar",
aes(x = na_score)) +
stat_summary(fun = mean,
geom = "point",
color = "red",
size = 1,
aes(x = na_score)) +
scale_color_manual(values = c("blue","black")) +
labs(subtitle = "mean score (red) and 95% CI error bars\nmean + error bars count NA as zero")
This is close to my desired output, but ideally NA
and normal points would have the same jitter, both horizontally and vertically. The main concern here is that normal data points are plotted twice, once as a jitter in geom_jitter()
and once with geom_miss_point()
. It would be easily hidden by tweaking alpha, but I exaggerated it to show the problem here.
ggplot(dat, aes(y = item, x = score)) +
geom_miss_point(alpha = 0.08) +
geom_jitter(height = 0.2, width = 0.05, alpha = 0.1) +
stat_summary(fun.data = "mean_cl_normal",
geom = "errorbar",
aes(x = na_score)) +
stat_summary(fun = mean,
geom = "point",
color = "red",
size = 1,
aes(x = na_score)) +
scale_color_manual(values = c("blue","black")) +
labs(subtitle = "mean score (red) and 95% CI error bars\nmean + error bars count NA as zero")
How can I achieve my desired output?
?geom_miss_point
mentions the use of ggobi
methods` to plot two things on the same axis. Maybe this is a way to go.
Perhaps I miss something. But one option would be to plot the missing and non-missing data using just one geom_jitter
without the need of geom_miss_point
:
library(ggplot2)
dat$x <- dat$score
dat$x[is.na(dat$x)] <- -.1
ggplot(dat, aes(y = item, x = score)) +
geom_jitter(
aes(x = x, color = !is.na(score)),
height = 0.2, width = 0.05, alpha = 0.1
) +
stat_summary(
fun.data = "mean_cl_normal",
geom = "errorbar",
aes(x = na_score)
) +
stat_summary(
fun = mean,
geom = "point",
color = "red",
size = 1,
aes(x = na_score)
) +
scale_color_manual(
name = "missing",
values = c("blue", "black"),
labels = c("Missing", "Not missing")
) +
labs(subtitle = "mean score (red) and 95% CI error bars\nmean + error bars count NA as zero")