stanrstanpystan

Efficiently sample a collection of multi-normal variables with varying sigma (covariance) matrix


I'm new to Stan, so hoping you can point me in the right direction. I'll build up to my situation to make sure we're on the same page...

If I had a collection of univariate normals, the docs tell me that:

y ~ normal(mu_vec, sigma);

provides the same model as the unvectorized version:

for (n in 1:N)
    y[n] ~ normal(mu_vec[n], sigma);

but that the vectorized version is (much?) faster. Ok, fine, makes good sense.

So the first question is: is it possible to take advantage of this vectorization speedup in the univariate normal case where both the mu and sigma of the samples vary by position in the vector. I.e. if both mu_vec and sigma_vec are vectors (in the previous case sigma was a scalar), then is this:

y ~ normal(mu_vec, sigma_vec);

equivalent to this:

for (n in 1:N)
    y[n] ~ normal(mu_vec[n], sigma_vec[n]);

and if so is there a comparable speedup?

Ok. That's the warmup. The real question is how to best approach the multi-variate equivalent of the above.

In my particular case, I have N of observations of bivariate data for some variable y, which I store in an N x 2 matrix. (For order of magnitude, N is about 1000 in my use case.)

My belief is that the mean of each component of each observation is 0 and that the stdev of each component is each observation is 1 (and I'm happy to hard code them as such). However, my belief is that the correlation (rho) varies from observation to observation as a (simple) function of another observed variable, x (stored in an N element vector). For example, we might say that rho[n] = 2*inverse_logit(beta * x[n]) - 1 for n in 1:N and our goal is to learn about beta from our data. I.e. the covariance matrix for the nth observation would be:

[1,      rho[n]]
[rho[n], 1     ]

My question is what's the best way to put this together in a STAN model so that it isn't slow as heck? Is there a vectorized version of the multi_normal distribution so that I could specify this as:

y ~ multi_normal(vector_of_mu_2_tuples, vector_of_sigma_matrices)

or perhaps as some other similar formulation? Or will I need to write:

for (n in 1:N)
    y[n] ~ multi_normal(vector_of_mu_2_tuples[n], vector_of_sigma_matrices[n])

after having set up vector_of_sigma_matrices and vector_of_mu_2_tuples in an earlier block?

Thanks in advance for any guidance!


Edit to add code

Using python, I can generate data in the spirit of my problem as follows:

import numpy as np
import pandas as pd
import pystan as pys
import scipy as sp
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import seaborn as sns

def gen_normal_data(N, true_beta, true_mu, true_stdevs):
    N = N

    true_beta = true_beta
    true_mu = true_mu
    true_stdevs = true_stdevs

    drivers = np.random.randn(N)
    correls = 2.0 * sp.special.expit(drivers*true_beta)-1.0

    observations = []
    for i in range(N):
        covar = np.array([[true_stdevs[0]**2, true_stdevs[0] * true_stdevs[1] * correls[i]],
                          [true_stdevs[0] * true_stdevs[1] * correls[i], true_stdevs[1]**2]])
        observations.append(sp.stats.multivariate_normal.rvs(true_mu, covar, size=1).tolist())
    observations = np.array(observations)
    
    return {
        'N': N,
        'true_mu': true_mu,
        'true_stdev': true_stdevs,
        'y': observations,
        'd': drivers,
        'correls': correls
    }

and then actually generate the data using:

normal_data = gen_normal_data(100, 1.5, np.array([1., 5.]), np.array([2., 5.]))

Here's what the data set looks like (scatterplot of y colored by correls in the left pane and by drivers in the right pane...so the idea is that the higher the driver the closer to 1 the correl and the lower the driver, the closer to -1 the correl. So would expect red dots on the left pane to be "down-left to up-right" and blue dots to be "up-left to down-right", and indeed they are:

fig, axes = plt.subplots(1, 2, figsize=(15, 6))
x = normal_data['y'][:, 0]
y = normal_data['y'][:, 1]
correls = normal_data['correls']
drivers = normal_data['d']

for ax, colordata, cmap in zip(axes, [correls, drivers], ['coolwarm', 'viridis']):
    color_extreme = max(abs(colordata.max()), abs(colordata.min()))
    sc = ax.scatter(x, y, c=colordata, lw=0, cmap=cmap, vmin=-color_extreme, vmax=color_extreme)
    divider = make_axes_locatable(ax)
    cax = divider.append_axes('right', size='5%', pad=0.05)
    fig.colorbar(sc, cax=cax, orientation='vertical')
fig.tight_layout()

scatter plots

Using the brute force approach, I can set up a STAN model that looks like this:

model_naked = pys.StanModel(
    model_name='naked',
    model_code="""
data {
    int<lower=0> N;
    vector[2] true_mu;
    vector[2] true_stdev;
    real d[N];
    vector[2] y[N];
}

parameters {
    real beta;
}

transformed parameters {
}

model {
    real rho[N];
    matrix[2, 2] cov[N];
    
    for (n in 1:N) {
        rho[n] = 2.0*inv_logit(beta * d[n]) - 1.0;
 
        cov[n, 1, 1] = true_stdev[1]^2;
        cov[n, 1, 2] = true_stdev[1] * true_stdev[2] * rho[n];
        cov[n, 2, 1] = true_stdev[1] * true_stdev[2] * rho[n];
        cov[n, 2, 2] = true_stdev[2]^2;
    }
    
    beta ~ normal(0, 10000);
    for (n in 1:N) {
        y[n] ~ multi_normal(true_mu, cov[n]);
    }
}
"""
)

This fits nicely:

fit_naked = model_naked.sampling(data=normal_data, iter=1000, chains=2)f = fit_naked.plot();
f.tight_layout()

trace for beta

But I'm hoping someone can point me in the right direction for the "marginalized" approach where we break down our bivariate normal into a pair of independent normals that can be blended using the correlation. The reason I need this is that in my actual use case, both dimensions of are fat-tailed. I am happy to model this as a student-t distribution, but the issue is that STAN only allows a single nu to be specified (not one for each dimension), so I think I'll need to find a way to decompose a multi_student_t into a pair of independent student_t's so that I can set the degrees of freedom separately for each dimension.


Solution

  • The univariate normal distribution does accept vectors for any or all of its arguments and it will be faster than looping over the N observations to call it N times with scalar arguments.

    However, the speedup is only going to be linear because the calculations are all the same, but it only has to allocate memory once if you only call it once. The overall wall time is more affected by the number of function evaluations you have to do, which is up to 2^10 - 1 per MCMC iteration (by default), but whether you hit the maximum treedepth depends on the geometry of the posterior distribution you are trying to sample from, which, in turn, depends on everything including the data you condition on.

    The bivariate normal distribution can be written as a product of a marginal univariate normal distribution for the first variable and a conditional univariate normal distribution for the second variable given the first variable. In Stan code, we can utilize element-wise multiplication and division to write its log-density like

    target += normal_lpdf(first_variable | first_means, first_sigmas);
    target += normal_lpdf(second_variable | second_means + 
      rhos .* first_sigmas ./ second_sigmas .* (first_variable - first_means),
      second_sigmas .* sqrt(1 - square(rhos)));
    

    Unfortunately, the more general multivariate normal distribution in Stan does not have an implementation that inputs arrays of covariance matrices.