I am running a multivariate model (4 response variables) with two random effects using MCMCglmm()
. I am currently using a inverse Wishart prior.
###current set up for prior
prior.miw<-list(R=list(V=diag(4), nu=0.002),
G=list(G1=list(V=diag(4),
nu=0.002,
alpha.mu=c(0,0,0,0),
alpha.V=diag(4)*1000),
G2=list(V=diag(4), #need to repeat to deal with second random effect
nu=0.002,
alpha.mu=c(0,0,0,0),
alpha.V=diag(4)*1000)))
Based on my model, it appears I should really run a scaled inverse Wishart prior (see page 14 of Lemoine 2019).
My question: how do I make this into a prior that is suitable for my MCMCglmm()
model?
Additionally, if this is NOT the sort of prior I should be running, please feel free to weigh in.
This is a two-part question:
MCMCglmm
, and how do I implement them there? [good for Stack Overflow]The general trick is to decompose the covariance matrix into a multivariate component (the correlation matrix or inverse correlation matrix or something) and a vector of scaling parameters for the standard deviations (or inverse standard deviations); Lemoine suggests U(0,100) for the scaling priors, which I think is bad (why flat? I can't get to the precise page of Gelman and Hill 2007 where they discuss which distribution to use for scaling priors ... but I would be a little surprised if they actually recommended a uniform distribution on the variance scale ...)
update having actually looked at your code (!): I think you're doing the right thing, except that nu=0.002
seems really extreme; see end for that discussion.
This is basically what MCMCglmm
does, but it uses a different (IMO better) choice for the scaling priors. It sounds scary:
These priors are all from the non-central scaled F-distribution, which implies the prior for the standard deviation is a non-central folded scaled t-distribution (Gelman, 2006).
but it boils down to choosing four parameters, only two of which you really have to think about.
V
: the prior mean variance (or the prior mean covariance matrix, if you have a multivariate random effect term). According to the course notes, "without loss of generality V can be set to one" (or in the case of a multivariate model, to an identity matrix)alpha.mu
: we almost always want this to be zero (or as in your example, a vector of zeros); that way the prior for the standard deviation will be a Student t distribution. (There may be a use case for alpha.mu != 0
, but I've never run across it.)alpha.V
: with V
set to 1 (or an identity matrix), this is the prior mean of the covariance matrix. A diagonal matrix with a reasonable scale for your problem is a good choicenu
: the shape parameter; as nu
→ ∞ we get a half-Normal prior for the standard deviations, with nu
=1 we get a Cauchy distribution. Smaller values have fatter tails (less conservative/allowing broader samples, but also giving more danger of weird sampling behaviour in the tails).For the univariate case Hadfield says the t prior with V=1
is
2 * dt(sqrt(v)/sqrt(alpha.V), df = nu,
ncp = alpha.mu/sqrt(alpha.V))
with v
restricted to be zero.
alpha.mu=0
like sensible people the ncp
(non-centrality parameter) argument is zero;sqrt(alpha.V)
means we scale the t-distribution by alpha.V
sqrt(alpha.V)*abs(rt(1, nu=nu))
I wish Hadfield had given an example of parameter expansion in a multivariate inverse-Wishart case. Where did you find yours?
I think nu=0.002
is a dangerous choice. Why?
It's tempting to make your priors "as flat as possible", but this has two potentially bad consequences.
R=list(V=diag(4), nu=0.002)
. This is OK if your data are informative (reasonable guess for the residual variance), but will sample badly if they're not.nu=1
corresponds to a Cauchy distribution, whose tails are so fat that its mean is infinite, then imagine making the shape parameter 500 times smaller (nu=0.002
) ...