I'd like to illustrate ICC. I'd love to create a plot where one can see, that with a higher ICC more of the variability in a dependent variable can be attributed to the groups.
I've stumble across this depiction (https://bookdown.org/marklhc/notes_bookdown/hierarchical-multilevel-models.html#multilevel-modeling-mlm):
And I think it does a great job explaining this fact.
Now I've tried to recreate the spirit of this plot with the following code:
ni <- 30
#numbers of student per class
nj <- 15
#numbers of clusters/classes
ICC <- seq(.1,.9,.1)
#starting with 0, then going up .1 stepwise till .9, thus generating 10 values
# what is the random intercept standard deviation given icc
RI_sd <- sqrt(1 / (1-ICC) - 1)
# RI_sd
# -------------------------------------------------------------------------
dgp_grid <- expand.grid(
ni = 1:ni,
nj = 1:nj,
x = NA,
y = NA, # with two predictor variables
RI_sd = 0
)
dgp_grid$U0j <- rep(rnorm(nj, mean = 0, sd = RI_sd), each = ni)
#create level 2 residual (ICC)
dgp_grid$x <- rnorm(ni*nj,mean = 0, sd = 1)
# create level 1 x explanatory/predictor variable (draw from standard normal)
dgp_grid$y <-
dgp_grid$x +
dgp_grid$U0j
# -------------------------------------------------------------------------
library(ggplot2)
# Define the desired range for the y-axis
x_axis_limit <- c(-6, 6)
# Plot the data with fixed y-axis range
ggplot(dgp_grid, aes(x = y, y = nj)) +
geom_jitter(aes(color = factor(nj)), size = 3, width = 0.2) +
geom_boxplot(aes(group = nj), alpha = 0.5, color = "black", width = 0.2) +
theme_minimal() +
coord_flip() +
xlim(x_axis_limit)
I basically create some clusters nj
(one could think of them as classes in, e.g., a school) and then try to weave in a residual term for the ICC. No If I want to have no ICC I would set RI_sd = 0
, and if I want a very big ICC I would set RI_sd = 3
.
This is what the plot looks like with RI_sd = 0
:
And this is waht it looks like with RI_sd = 3
:
I have two questions.
I think it would be best to write a function that takes ICC and outputs a plot similar in style to the linked example. We can even get it to take a vector of ICC values and output a panel for each value.
show_icc <- function(icc = 0.5) {
# These could be function parameters rather than constants if preferred.
group_mean <- 5 # This is arbitrary as there are no numbers on y axis
group_sd <- 1 # This is also arbitrary
n_groups <- 10 # Number of groups along x axis
obs_per_group <- 10 # Number of points per group
# This gives us a vector of means, one for each group
mu <- rnorm(n_groups, group_mean, group_sd)
dat <- lapply(icc, function(icc) {
# Calculate sd within group from icc then generate group samples
sigma <- group_sd^2/icc - group_sd^2
replicate(obs_per_group, rnorm(n_groups, mu, sigma)) |>
t() |>
as.data.frame() |>
stack() |>
setNames(c('value', 'group')) |>
within(icc <- paste('ICC = ', icc))
})
do.call('rbind', dat) |>
ggplot(aes(group, value)) +
geom_point(color = 'gray') +
stat_summary(fun = mean, geom = 'point',
shape = 24, fill = 'red', size = 4) +
facet_wrap(.~icc, scales = 'free_y') +
theme_classic() +
theme(strip.background = element_blank(),
strip.text = element_text(size = 14, face = 2),
axis.text.y = element_blank())
}
This allows
show_icc(c(0.1, 0.7, 0.9))