rbayesianmcmcjagsrjags

JAGS in R: Resulting MCMC chain does not start near initial values


I am running the code in this simple Rjags tutorial. Notice that the initial values of the parameters are specified by model.inits <- list(tau=1, beta0=1, beta1=0, beta2=0). I wanted to test that the resulting MCMC chain actually starts at those initial values.

Therefore, for testing purposes, within jags.model(), I set n.adapt=0, so no adaptation takes place. Printing head(model.samples[[1]]) results in the first row having beta0 at 0.58, beta1 at 0.97, and beta2 at 0.54. Thus, after just the first iteration, none of the parameters are even close to their initializations.

Would appreciate an explanation as to whether I'm missing something here.


Solution

  • The problem is that JAGS doesn't record the initial values as the first of the sampled values. Using your example, I put the model in a file called regression.mod, the data in a file called regression.dat and the initial values in a file called regression.inits. Then, I opened up the terminal and started JAGS (rather than running through R, though it should work the same in R). I did the following:

    (base) david@Daves-MacBook-Pro Downloads % jags              
    Welcome to JAGS 4.3.1 (official binary) on Fri Mar 31 07:27:34 2023
    JAGS is free software and comes with ABSOLUTELY NO WARRANTY
    Loading module: basemod: ok
    Loading module: bugs: ok
    . model in regression.mod
    
    . data in regression.dat
    Reading data file regression.dat
    
    . compile
    Compiling model graph
       Resolving undeclared variables
       Allocating nodes
    Graph information:
       Observed stochastic nodes: 500
       Unobserved stochastic nodes: 4
       Total graph size: 3023
    
    . parameters in regression.inits
    Reading parameter file regression.inits
    
    . initialize   
    Initializing model
    
    . parameters to actual.txt, chain(1)
    

    The results of parameters to shows the current state of the initialized model. That confirms the initial values are set appropriately.

    `.RNG.name` <- "base::Wichmann-Hill"
    `.RNG.state` <- 
    c(16850L,14297L,4084L)
    `beta0` <- 
    1
    `beta1` <- 
    0
    `beta2` <- 
    0
    `tau` <- 
    1
    

    Next, I took one draw from the posterior and saved the values for that one draw:

    . monitor tau      
    . monitor beta1
    . monitor beta2
    . monitor beta0
    . update 1
    Updating 1
    
    . coda *, stem(samp1_)
    

    The results show that the information from the single draw was:

    tau    0.336252
    beta1  0.896834
    beta2  0.583655
    beta0  0.393297
    

    The other big sign is that if you set monitors directly after initialization and save the values for the chains, the files are empty, meaning that JAGS doesn't put the initial values in with the sampled values.


    Edit: Answer questions in comments

    The question in the comment is - shouldn't the chain move more slowly from the initial values? The answer is - it depends. In particular, it depends on the correlation structure of the X variables (and by extension, the correlation structure of the parameters, and whether that correlation structure is built into the model). Below is a simple two independent variable regression problem (without intercept). In one model, the x variables are uncorrelated in the other, they correlate at 0.93 (in the population). The priors on the regression coefficients are independent normals. I started the b values (at -25, 25) way away from their actual values (-1,1). The figure at the bottom shows that when the x-variables are uncorrelated, the sampler moves almost immediately to the stationary distribution. When the x-variables are highly correlated, it takes much longer to get there.

    library(dplyr)
    library(ggplot2)
    library(rjags)
    library(coda)
    set.seed(519)
    mod <- "model{
      for(i in 1:250){
        y[i] ~ dnorm(mu[i], tau)
        mu[i] <- inprod(X[i,], b)
      }
      tau ~ dgamma(1,.1)
      b[1] ~ dnorm(0,.01)
      b[2] ~ dnorm(0,.01)
    }"
    
    uncor_X <- MASS::mvrnorm(250, c(0,0), diag(2))
    
    b <- c(-1, 1)
    
    y1 <- uncor_X %*%b + rnorm(250, 0, 1)
    
    cor_X <- MASS::mvrnorm(250, c(0,0), matrix(c(1, .93, .93, 1), ncol=2))
    y2 <- cor_X %*% b + rnorm(250, 0, .25)
    
    
    inits <- list(b=c(-25, 25), tau=1)
    
    library(rjags)
    mod_u <- jags.model(textConnection(mod), 
                        data = list(X = uncor_X, y=c(y1)), 
                        inits = inits, 
                        n.adapt=0)
    #> Compiling model graph
    #>    Resolving undeclared variables
    #>    Allocating nodes
    #> Graph information:
    #>    Observed stochastic nodes: 250
    #>    Unobserved stochastic nodes: 3
    #>    Total graph size: 1258
    #> 
    #> Initializing model
    samp_u <- coda.samples(mod_u, variable.names = c("b", "tau"), n.iter=100)
    
    mod_c <- jags.model(textConnection(mod), 
                        data = list(X = cor_X, y=c(y2)), 
                        inits = inits, 
                        n.adapt=0)
    #> Compiling model graph
    #>    Resolving undeclared variables
    #>    Allocating nodes
    #> Graph information:
    #>    Observed stochastic nodes: 250
    #>    Unobserved stochastic nodes: 3
    #>    Total graph size: 1258
    #> 
    #> Initializing model
    samp_c <- coda.samples(mod_c, variable.names = c("b", "tau"), n.iter=100)
    
    sampc <- tibble::as_tibble(samp_c[[1]]) %>% 
      rbind(c(-25, 25, 1), .) %>% 
      setNames(c("b1", "b2", "tau")) %>% 
      mutate(structure = "Correlated")
    sampu <- tibble::as_tibble(samp_u[[1]]) %>% 
      rbind(c(-25, 25, 1), .) %>% 
      setNames(c("b1", "b2", "tau")) %>% 
      mutate(structure = "Uncorrelated")
    
    samps <- bind_rows(sampu, sampc)
    ggplot(samps, aes(x=b1, y=b2)) + 
      geom_point(alpha=.1) + 
      geom_path(alpha=.1) + 
      geom_text(aes(x=-23, y=25,label="Start"), inherit.aes = FALSE) + 
      theme_bw() + 
      theme(panel.grid=element_blank()) + 
      facet_wrap(~structure, ncol=2)
    

    However, even with correlated data, if we use a bivariate normal prior distribution for the coefficients that can accommodate correlated parameters, the slow mixing goes away

    mnmod <- "model{
      for(i in 1:250){
        y[i] ~ dnorm(mu[i], tau)
        mu[i] <- inprod(X[i,], b)
      }
      tau ~ dgamma(1,.1)
      b ~ dmnorm(b0, B0)
    }"
    
    mod_mn <- jags.model(textConnection(mnmod), 
                        data = list(X = cor_X, y=c(y2), b0=c(0,0), B0 = diag(2)*.01), 
                        inits = inits, 
                        n.adapt=0)
    #> Compiling model graph
    #>    Resolving undeclared variables
    #>    Allocating nodes
    #> Graph information:
    #>    Observed stochastic nodes: 250
    #>    Unobserved stochastic nodes: 2
    #>    Total graph size: 1262
    #> 
    #> Initializing model
    samp_mn <- coda.samples(mod_mn, variable.names = c("b", "tau"), n.iter=100)
    
    sampmn <- tibble::as_tibble(rbind(c(-25, 25, 1), samp_mn[[1]])) %>% setNames(c("b1", "b2", "tau"))
    
    
    ggplot(sampmn, aes(x=b1, y=b2)) + 
      geom_point(alpha=.1) + 
      geom_path(alpha=.1) + 
      geom_text(aes(x=-23, y=25,label="Start"), inherit.aes = FALSE) + 
      theme_bw() + 
      theme(panel.grid=element_blank())
    

    Created on 2023-04-01 with reprex v2.0.2