variablessyntaxdeclarationstanpystan

Stan variable declarations: difference in use between var_type var_name[length] and vector[length] var_name


I am new at Stan and I'm struggling to understand the difference in how different variable declaration styles are used. In particular, I am confused about when I should put square brackets after the variable type and when I should put them after the variable name. For example, given int<lower = 0> L; // length of my data, let's consider:

real N[L]; // my variable

versus

vector[L] N; // my variable

From what I understand, both declare a variable N as a vector of length L. Is the only difference between the two that the first way specifies the variable type? Can they be used interchangeably? Should they belong do different parts of the Stan code (e.g., data vs parameters or model)?

Thanks for explaining!


Solution

  • real name[size] and vector[size] name can be used pretty interchangeably. They are stored differently internally, so you can get better performance with one or the other. Some operations might also be restricted to one and the other (e.g. vector multiplication) and the optimal order to loop over them changes. E.g. with a matrix vs. a 2-D array, it is more efficient to loop over rows first vs. columns first, but those will come up if you have a more specific example. The way to read this is:

    real name[size];
    

    means name is an array of type real, so a bunch of reals that are stored together.

    vector[size] name;
    

    means that name is a vector of size size, which is also a bunch of reals stored together. But the vector data type in STAN is based on the eigen c++ library (c++) and that allows for other operations.

    You can also create arrays of vectors like this:

    vector[N] name[K];
    

    which is going to produce an array of K vectors of size N.

    Bottom line: You can get any model running with using vector or real, but they're not necessarily equivalent in the computational efficiency.