I've estimated a huge and complex time-varying hierarchical model in JAGS for time series of hundreds of individuals. I estimate time-varying parameters for individuals leading to 80 parameters per person. The parameters are random effects that follow a multivariate normal distribution. When I try to check for convergence of the individual parameters with coda::gelman.diag(samples)
, my computer is running into memory problems and R crashes after 12h. Is is sufficient to check convergence for the means and precisions of the multivariate normal distribution, or is it necessary to check convergence for parameters of individuals (which follow above mentioned distribution)? What is the convention here?
The issue is likely because you are trying to calculate the gelman rubin diagnostic for every parameter, and as a result you are running out of RAM. There is a bit of a work around for this, which is applying the function one at a time for each parameter. I'm assuming you have the RAM to store the results in a matrix given that your computer can store the `samples of your model.
Here is a very simple example model in rjags
, and how you would just work through the parameters one at a time.
library(rjags)
# load the model
data(LINE)
# recompile it
LINE$recompile()
# get some samples
LINE.out <- coda.samples(LINE, c("alpha","beta","sigma"), n.iter=1000)
# Get the number of parameters
npar <- ncol(LINE.out[[1]])
# and the names of the parameters
par_names <- colnames(LINE.out[[1]])
# A matrix to store all of the results
gdag <- matrix(NA, ncol = 2, nrow = npar)
# give rows informative names
row.names(gdag) <- par_names
# give the columns informative names (based on output of gelman.diag())
colnames(gdag) <- c("PointEst", "UpperC.I.")
# progress bar
pb <- txtProgressBar(0, npar)
# for loop to work through things one at a time
for(par in 1:npar){
setTxtProgressBar(pb, par)
tmp <- lapply(LINE.out, function(x) x[,par])
gdag[par,] <- gelman.diag(tmp)[[1]]
}
If, for some reason, you can't store everything in gdag
, you could just keep appending the results to a csv using write.table()
. Something like this should work.
library(rjags)
# load the model
data(LINE)
# recompile it
LINE$recompile()
# get some samples
LINE.out <- coda.samples(LINE, c("alpha","beta","sigma"), n.iter=1000)
# Get the number of parameters
npar <- ncol(LINE.out[[1]])
# and the names of the parameters
par_names <- colnames(LINE.out[[1]])
# Create the headers of the table
write.table(
matrix(c("PointEst", "UpperC.I."), ncol = 2, nrow = 1),
"my_gelman_rubins.csv",
sep = ",",
col.names = FALSE,
row.names = TRUE
)
# progress bar
pb <- txtProgressBar(0, npar)
for(par in 1:npar){
setTxtProgressBar(pb, par)
tmp <- lapply(LINE.out, function(x) x[,par])
gdag <- gelman.diag(tmp)[[1]]
row.names(gdag) <- par_names[par]
write.table(
gdag,
"my_gelman_rubins.csv",
append = TRUE,
sep = ",",
col.names = FALSE,
row.names = TRUE
)
}
The latter approach will take a little longer, but there is the added benefit of actually saving the results to a file. Thus, if you do have memory allocation issues you could just resume your diagnostic calculations at whatever parameter you need to start with. For example, if you hit an error .on the 501st parameter you'd just modify the for loop to be for(par in 501:npar)