rparallel-processingsimulationmcmcrunjags

Update number of observations in each simulation (using libraries: runjags, parallel)


I am splitting my dataset by simulation ID and applying a runjags functions to each subsest simultaneously.

Right now, each simulation contains 1000 observations. I know that sometimes the number of observations will differ since I will be dropping rows that meet certain criteria. I don't know how many observations will be dropped but I can calculate that by using groupobs <- fulldata %>% count(SimulID, sort=TRUE).

Is there a way that I can change N=1000 during each simulation run. It would mean having to rewrite the tempModel.txt file with every simulation that is run.

Thank you.

#Subset data by SimulID
subsetdata <- split(fulldata, as.factor(fulldata$SimulID))
#Count obs within each group
groupobs <- fulldata %>% count(SimulID, sort=TRUE)

modelString <- "
  model{
#Model specification
   for (i in 1:1000) {
      y[i]~dnorm(muy[i], Inv_sig2_e)
      muy[i]<-b0+b1*x1[i]+b2*x2[i]
   }
#priors
   b0~dnorm(0, 1.0E-6)
   b1~dnorm(0, 1.0E-6)
   b2~dnorm(0, 1.0E-6)
   Inv_sig2_e~dgamma(1.0E-3, 1.0E-3)
#parameter transformation
   Sig2_e<-1/Inv_sig2_e
  }
"

writeLines(modelString, "tempModel.txt")

output_models <- lapply(subsetdata, function(x){
  model_data = x
  initsList1 <- list(b0=1, b1=1, b2=1, Inv_sig2_e=1)
  initsList2 <- list(b0=1, b1=2, b2=3, Inv_sig2_e=1)
  initsList3 <- list(b0=2, b1=3, b2=4, Inv_sig2_e=1)





 runJagsOut <- run.jags(method = "parallel",
                         model = "tempModel.txt",
                         # NOTE: theta and omega are vectors:
                         monitor = c( "b0","b1","b2","Sig2_e"),
                         data = model_data,
                         inits = list(initsList1, initsList2, initsList3), # NOTE: Let JAGS initialize.
                         n.chains = 3, # NOTE: Not only 1 chain.
                         adapt = 500,
                         burnin = 2500,
                         sample = 2500,
                         thin = 1,
                         summarise = FALSE,
                         plots = FALSE)
})

Solution

  • You have several options

    You could construct the model string on the fly. [The model argument to run.jags can contain a character string instead of a file name, so there's no need to write to a file and then read it in again.]

    You can add an element to your data list (x in your code) that contains the number of observations,

    x[["groupobs"]] <- fulldata %>% count(SimulID, sort=TRUE)
    

    and refer to that in your model_string:

    for (i in 1:groupobs)
    

    You could calculate the number of observations on the fly:

    for (i in 1:length(y))
    

    in your model_string.

    Edit In response to OP's comment, here are implementations of each of my three suggestins above. The OP's code is not reproducible as they haven't provided their data, so I will reanalyse an example used by O'Quigley et al in their 1990 CRM paper. To reproduce OP's grouped analysis, I'll duplicate the data and simply analyse it twice.

    Input data:

    dput(observedData)
    structure(list(Cohort = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
    10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 
    23L, 24L, 25L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 
    12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 
    25L), SubjectID = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
    11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 
    24L, 25L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 
    13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L
    ), Dose = c(3, 4, 4, 3, 3, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 
    2, 2, 2, 2, 2, 2, 1, 1, 3, 4, 4, 3, 3, 2, 1, 1, 1, 2, 2, 2, 2, 
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1), Toxicity = c(0, 0, 1, 0, 
    1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 
    0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 
    1, 0, 1, 1), Trial = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)), row.names = c(NA, 
    -50L), class = c("tbl_df", "tbl", "data.frame"))
    

    I find the tidyverse's group_map function provides code that is both more compact and easier to understand than lapply, so I'll use that.

    library(tidyverse)
    library(runjags)
    

    Option 1: paste the observation count into the model string.

    modelString <- 
    "model { 
      #Prior 
      a ~ dexp(1) 
      #Likelihood 
      for (i in 1:n) { 
        Toxicity[i] ~ dbern(((tanh(XHat[i]) + 1)/2)**a) 
      } 
    } 
    #monitor# a"
    
    fit1 <- function(.x, .y) {
      modelString <- paste0(
        "model { 
          #Prior 
          a ~ dexp(1) 
          #Likelihood 
          for (i in 1:",
          .x %>% nrow(),
          ") { 
            Toxicity[i] ~ dbern(((tanh(XHat[i]) + 1)/2)**a) 
          } 
        } 
        #monitor# a")
      d <- list(XHat=.x$Dose, Toxicity=.x$Toxicity)
      run.jags(modelString, data=d)
    }
    
    observedData %>% group_by(Trial) %>% group_map(fit1)
    

    Option 2: pass the observation count as an element of data

    modelString <- 
      "model { 
        #Prior 
        a ~ dexp(1) 
        #Likelihood 
        for (i in 1:n) { 
          Toxicity[i] ~ dbern(((tanh(XHat[i]) + 1)/2)**a) 
        } 
       } 
       #monitor# a"
    
    fit2 <- function(.x, .y) {
      d <- list(XHat=.x$Dose, Toxicity=.x$Toxicity, n=.x %>% nrow())
      run.jags(modelString, data=d)
    }
    
    observedData %>% group_by(Trial) %>% group_map(fit2)
    

    Option 3: Let JAGS calculate the observation count

    modelString <- 
      "model { 
        #Prior 
        a ~ dexp(1) 
        #Likelihood 
        for (i in 1:length(Toxicity)) { 
          Toxicity[i] ~ dbern(((tanh(XHat[i]) + 1)/2)**a) 
        } 
      } 
      #monitor# a"
    
    fit3 <- function(.x, .y) {
      d <- list(XHat=.x$Dose, Toxicity=.x$Toxicity)
      run.jags(modelString, data=d)
    }
    
    observedData %>% group_by(Trial) %>% group_map(fit3)
    

    My personal preference is for option 2.

    I've used .x and .y as argument names to the three fitX functions to match the convention used in the online documentation for group_map.