JAGS in R: Resulting MCMC chain does not start near initial values

I am running the code in this simple Rjags tutorial. Notice that the initial values of the parameters are specified by model.inits <- list(tau=1, beta0=1, beta1=0, beta2=0). I wanted to test that the resulting MCMC chain actually starts at those initial values.

Therefore, for testing purposes, within jags.model(), I set n.adapt=0, so no adaptation takes place. Printing head(model.samples[[1]]) results in the first row having beta0 at 0.58, beta1 at 0.97, and beta2 at 0.54. Thus, after just the first iteration, none of the parameters are even close to their initializations.

Would appreciate an explanation as to whether I'm missing something here.

Solution

The problem is that JAGS doesn't record the initial values as the first of the sampled values. Using your example, I put the model in a file called regression.mod, the data in a file called regression.dat and the initial values in a file called regression.inits. Then, I opened up the terminal and started JAGS (rather than running through R, though it should work the same in R). I did the following:

(base) david@Daves-MacBook-Pro Downloads % jags              
Welcome to JAGS 4.3.1 (official binary) on Fri Mar 31 07:27:34 2023
JAGS is free software and comes with ABSOLUTELY NO WARRANTY
Loading module: basemod: ok
Loading module: bugs: ok
. model in regression.mod

. data in regression.dat
Reading data file regression.dat

. compile
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
Graph information:
   Observed stochastic nodes: 500
   Unobserved stochastic nodes: 4
   Total graph size: 3023

. parameters in regression.inits
Reading parameter file regression.inits

. initialize   
Initializing model

. parameters to actual.txt, chain(1)

The results of parameters to shows the current state of the initialized model. That confirms the initial values are set appropriately.

`.RNG.name` <- "base::Wichmann-Hill"
`.RNG.state` <- 
c(16850L,14297L,4084L)
`beta0` <- 
1
`beta1` <- 
0
`beta2` <- 
0
`tau` <- 
1

Next, I took one draw from the posterior and saved the values for that one draw:

. monitor tau      
. monitor beta1
. monitor beta2
. monitor beta0
. update 1
Updating 1

. coda *, stem(samp1_)

The results show that the information from the single draw was:

tau    0.336252
beta1  0.896834
beta2  0.583655
beta0  0.393297

The other big sign is that if you set monitors directly after initialization and save the values for the chains, the files are empty, meaning that JAGS doesn't put the initial values in with the sampled values.

Edit: Answer questions in comments

The question in the comment is - shouldn't the chain move more slowly from the initial values? The answer is - it depends. In particular, it depends on the correlation structure of the X variables (and by extension, the correlation structure of the parameters, and whether that correlation structure is built into the model). Below is a simple two independent variable regression problem (without intercept). In one model, the x variables are uncorrelated in the other, they correlate at 0.93 (in the population). The priors on the regression coefficients are independent normals. I started the b values (at -25, 25) way away from their actual values (-1,1). The figure at the bottom shows that when the x-variables are uncorrelated, the sampler moves almost immediately to the stationary distribution. When the x-variables are highly correlated, it takes much longer to get there.

library(dplyr)
library(ggplot2)
library(rjags)
library(coda)
set.seed(519)
mod <- "model{
  for(i in 1:250){
    y[i] ~ dnorm(mu[i], tau)
    mu[i] <- inprod(X[i,], b)
  }
  tau ~ dgamma(1,.1)
  b[1] ~ dnorm(0,.01)
  b[2] ~ dnorm(0,.01)
}"

uncor_X <- MASS::mvrnorm(250, c(0,0), diag(2))

b <- c(-1, 1)

y1 <- uncor_X %*%b + rnorm(250, 0, 1)

cor_X <- MASS::mvrnorm(250, c(0,0), matrix(c(1, .93, .93, 1), ncol=2))
y2 <- cor_X %*% b + rnorm(250, 0, .25)


inits <- list(b=c(-25, 25), tau=1)

library(rjags)
mod_u <- jags.model(textConnection(mod), 
                    data = list(X = uncor_X, y=c(y1)), 
                    inits = inits, 
                    n.adapt=0)
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 250
#>    Unobserved stochastic nodes: 3
#>    Total graph size: 1258
#> 
#> Initializing model
samp_u <- coda.samples(mod_u, variable.names = c("b", "tau"), n.iter=100)

mod_c <- jags.model(textConnection(mod), 
                    data = list(X = cor_X, y=c(y2)), 
                    inits = inits, 
                    n.adapt=0)
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 250
#>    Unobserved stochastic nodes: 3
#>    Total graph size: 1258
#> 
#> Initializing model
samp_c <- coda.samples(mod_c, variable.names = c("b", "tau"), n.iter=100)

sampc <- tibble::as_tibble(samp_c[[1]]) %>% 
  rbind(c(-25, 25, 1), .) %>% 
  setNames(c("b1", "b2", "tau")) %>% 
  mutate(structure = "Correlated")
sampu <- tibble::as_tibble(samp_u[[1]]) %>% 
  rbind(c(-25, 25, 1), .) %>% 
  setNames(c("b1", "b2", "tau")) %>% 
  mutate(structure = "Uncorrelated")

samps <- bind_rows(sampu, sampc)
ggplot(samps, aes(x=b1, y=b2)) + 
  geom_point(alpha=.1) + 
  geom_path(alpha=.1) + 
  geom_text(aes(x=-23, y=25,label="Start"), inherit.aes = FALSE) + 
  theme_bw() + 
  theme(panel.grid=element_blank()) + 
  facet_wrap(~structure, ncol=2)

However, even with correlated data, if we use a bivariate normal prior distribution for the coefficients that can accommodate correlated parameters, the slow mixing goes away

mnmod <- "model{
  for(i in 1:250){
    y[i] ~ dnorm(mu[i], tau)
    mu[i] <- inprod(X[i,], b)
  }
  tau ~ dgamma(1,.1)
  b ~ dmnorm(b0, B0)
}"

mod_mn <- jags.model(textConnection(mnmod), 
                    data = list(X = cor_X, y=c(y2), b0=c(0,0), B0 = diag(2)*.01), 
                    inits = inits, 
                    n.adapt=0)
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 250
#>    Unobserved stochastic nodes: 2
#>    Total graph size: 1262
#> 
#> Initializing model
samp_mn <- coda.samples(mod_mn, variable.names = c("b", "tau"), n.iter=100)

sampmn <- tibble::as_tibble(rbind(c(-25, 25, 1), samp_mn[[1]])) %>% setNames(c("b1", "b2", "tau"))


ggplot(sampmn, aes(x=b1, y=b2)) + 
  geom_point(alpha=.1) + 
  geom_path(alpha=.1) + 
  geom_text(aes(x=-23, y=25,label="Start"), inherit.aes = FALSE) + 
  theme_bw() + 
  theme(panel.grid=element_blank())

^{Created on 2023-04-01 with reprex v2.0.2}