rrstan

Specify initial values in brms


I am trying to fit a brms model in R. Right now it does not converge well, whereas it worked well with JAGS. The only difference being the initial values, I would like to specify myself the initial values in the brm() function.

My model :

    ### Priors
    
    prior_brms <- c(brms::prior(normal(0, 1), class = "Intercept"),
                    brms::prior(normal(0, 1), class = "b"),
                    brms::prior(inv_gamma(10^-3, 10^-3), class = "sd"))
    
    ### Fit
    
    brms_mod <- brms::brm(formula = Y ~  species + X1 * species + (1|tree), 
                          data = Data_stan,
                          prior= prior_brms,
                          iter = 10000 , warmup = 5000, thin = 5,
                          chains = 2, cores=2)

I read the brm() function doc, which tells me to look at rstan's doc, and I really do not understand how to specify the initial values. It says it should be a list of lists using the same parameter names as Stan.

Does someone have an idea how to do this ?

Thanks in advance !

EDIT : turns out the convergence problem has nothing to do with initial values, but with the fact that my data is so well explained by the model (without pretention, the data is simulated ;) ) that there is almost no residual variance, which stan seems to dislike (previous post about this problem).


Solution

  • Specifying the initial values for the brms package is really simple. You must declare a value for each parameter in your model, however, you must do so for each Monte Carlo method Markov Chain (MCMC) number.

    In your case you have the parameters "Intercept", the "b" corresponding to the covariates, and the "sd" (standard deviation) parameter from the response distribution declared by brm function default (which is the Gaussian in your example).

    You can try something like this. Try to adjust the starting kicks from your data.

    rm(list = ls())
    set.seed(943)
    Data_stan <- data.frame( Y =rnorm(20,0,1),
                             X1 = rnorm(20,0,1)*5,
                             species= factor(rep(LETTERS[1:4],each=5)),
                             tree= factor(rep(c("I","II"),time=10)))
    
    # Initial kicks
    init_func <- function(chain_id=1) {
      list ( Intercept  =  0.5 ,
             beta  =  0.3 ,
             sigma  =  0.8)
    }
    # See an example output
    init_func(chain_id = 1)
    # In your case just two chains is needed.
    init_list <- list(
      init_func(chain_id = 1),
      init_func(chain_id = 2)
    )
    
    # NUTS configuration
    control <- list(
      adapt_engaged = TRUE,
      adapt_delta = 0.95, #increased from default of 0.8
      stepsize = 0.05, # 0.05 default
      max_treedepth = 15
    )
    ### Priors
    
    prior_brms <- c(brms::prior(normal(0, 1), class = "Intercept"),
                    brms::prior(normal(0, 1), class = "b"),
                    brms::prior(inv_gamma(10^-1, 10^-1), class = "sigma"))
    
    ### Fit
    
    brms_mod <- brms::brm(formula = Y ~  species + X1 * species + (1|tree), 
                          data = Data_stan,
                          prior= prior_brms,
                          iter = 4000 , warmup = 2000, thin = 2,
                          chains = 2, cores=6,
                          inits = init_list, control = control
                          )
    summary(brms_mod)
    

    I don't want to confuse but it's recommended to start with different kicks for each Markov chain. And this is possible from guesses generated by pseudo-random numbers. Or simply start with initial values equal to zero (by placing the parameter inits = "0" in the brm() function).

    Don't be put off (put down) by Stan's warning returns via the brm function. It's these warnings that make Stan stand out compared to other Bayesian modeling software. It is the Hamilton Monte Carlo method (MCMC) and the NUTS algorithm that makes it possible to make much more detailed diagnoses of the chains convergences that other sampling methods such as Gibbs or Metropolis do not allow. And that is the great asset that we should be concerned about. After all, it is the quality of the convergences of our chains that we guarantee that our model posteriors are reliable since we use numerical integration methods to find them.

    Finally, the convergence problem has everything to do with the initial kicks. Especially if the model is a non-linear model to which you are fitting the data. It has to do not only with the initial kicks but also with the quality of its priors who are being declared. I don't want to go into details here because it escapes the initial question, however, more details can be found in this article https://solomonkurz.netlify.app/post/2021-06-05-don-t-forget-your-inits/ that recommend being read.