I am looking at data from repeated experiments with 3 participants (hawks). We recorded the number of times we did the experiment with them (about 50x each) so that we could account for their learning in the model.
The aim of the study is to assess how a few different fixed effects and their interactions affect the response, which is continuous (distance (m)). Including the random effect is more to account for experimental design than out of a desire to understand the difference and learning between individuals. There definitely was a difference between the individuals' performance, and they did also learn and improve over time.
My question is regarding how to correctly structure a random effect for this purpose.
Initially I attempted to nest learning (Order) within the random effect for individual (ID) as such: (1|ID/Order) but got an error "number of levels of each grouping factor must be < number of observations", i think this is because Order is literally just a consecutive count with no repetitions per individual?
This thread https://stats.stackexchange.com/questions/31569/questions-about-how-random-effects-are-specified-in-lmer and Ben Bolker et al's awesome FAQ page https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#model-definition inspired me to try to structure the random effect as (Order|ID), where supposedly Order reflects the slope of the random effect, and ID affects the intercept, which SOUNDS like exactly what I want to do. The models appear to work when I run this, I just don't know enough to know whether this is somehow incorrect and am looking for expert opinion.
Thanks very much y'all!!
You have it right.
(1|ID/Order)
, that treats Order as a categorical predictor; since you have at most one observation per ID-Order combination, the ID:Order
random effect (ID|Order
expands to ID + ID:Order
) is confounded with the residual variance in a LMM.(Order|ID)
(which is equivalent to (1+Order|ID)
, assuming that Order
is a continuous (numeric) covariate, you are specifying a random-slopes model where you assume that change in the response is linear with respect to order, but that each individual has a different intercept and slope (and that the individual-level intercepts and slopes are drawn from a bivariate Gaussian distribution); you should make sure to include a fixed effect of Order
as well (otherwise you'll be assuming that on average hawks don't learn at all, i.e. that the population-average slope is 0)ns(Order, df = 4) + ... (ns(Order, df = 4)|ID)
, i.e. that the learning curve follows a natural spline with 4 degrees of freedom, and that each individual has its own curve. (You can fit these kinds of models slightly more generally/powerfully with mgcv
: see Pedersen et al. 2019.)I will say that since you only have 3 participants, although conceptually ID
would be treated as a random effect (it makes sense to assume that these three individuals were randomly selected from a larger population, and you're interested in generalizing from these individuals to the whole population), you might run into computational trouble, especially with some of the fancier suggestions above. Practically speaking there wouldn't be a huge difference between a random-effect model and a similar fixed-effect model that allowed ID
to interact with all the covariates that vary within subjects ...
Pedersen, Eric J., David L. Miller, Gavin L. Simpson, and Noam Ross. 2019. “Hierarchical Generalized Additive Models in Ecology: An Introduction with Mgcv.” PeerJ 7 (May): e6876. https://doi.org/10.7717/peerj.6876.