rnabayesianstanrstan

Dealing with NAs in Bayesian models


I made a certain Bayesian model, including the typical components (data, model, parameters, likelihood).

This model is a linear regression:

library(ggplot2)
#library (ggedit)
library(plyr)
library(StanHeaders)
library(rstan)

# Equation (1)

for(i in 1:N){
    alphaC_P[i] ~ normal ((alphaC_A[Date[i]]) * (1- (F_T[Date[i]])) +
                            alphaC_T[i] * (F_T[Date[i]]), sigma_C);
    }

Due to memory needs, I am running this analysis on a cluster.
I prepare the list of elements (e.g., #Equation (2): mylist <- list())

Finally, I run the Bayesian analysis on the cluster.

Equation (3)

rstan::stan(file=args[2], data= mylist, cores=12, warmup= 48000, 
                          iter= 50000, chains= 4, seed = 14)

# file=args[2] = Bayesian model

Since my data has NAs, my question is:
Where should I include the instruction to omit/ignore/exclude the NAs?
e.g., should it be in Equation #1, #2 or #3?

Finally, what should I do: omit, ignore, exclude them?

Thanks in advance


Solution

  • Your code example is not very clear. For example, in your first code snippet, you're mixing R with Stan syntax.

    From a Stan perspective it's very simple: Stan does not accept NAs in data. You can do two things:

    As to how to deal with NAs, that really depends on your data and data collection process, which only you know the details of (in other words, this needs domain-specific knowledge).

    Lastly, a lot of operations in Stan are vectorised. So instead of writing e.g.

    for (i in 1:N)
        y[i] ~ normal(mu[i], sigma)
    

    you can (and should) write

    y ~ normal(mu, sigma)