[SOLVED] effective sample size 2d array using arviz and pymc3 mcmc

effective sample size 2d array using arviz and pymc3 mcmc

I'm trying to get the effective sample size for a 2D mcmc chain, using pymc3 and arviz

import pymc3 as pm3
!pip install arviz 
import arviz as az

ess = az.ess(samples)

The above code works for 1D, but not for 2D, and I see there is a az.convert_to_dataset that might help, but I can't figure out how to use it?

Samples would be an N x 2 array and it should just give a single number as the output

Thanks!

Solution

When working with arrays, ArviZ assumes the following shape convention:

1d array represents the draws of a single chain of a scalar variable: (draw,)
2d array represents the draws of multiple chains of a scalar variable: (chain, draw)
3d+ array represents the draws of multiple chains of multidimensional variables: (chain, draw, *shape)

I am not sure why the 2d case is not working for you, I suspect it could be due to not having enough draws to calculate ess.

To make sure that your dimensions are being correctly interpreted, I would recommend doing idata = az.convert_to_inference_data(ary) and then checking idata.posterior to see the dimensions of the generated object. You can then call az.ess(idata) to get the effective sample size.

EDIT: If I understood your comments correctly, you are generating an array with shape (draw=N, parameter_dim=2) as you are only sampling a single chain. As this is a 2d array, it would be interpreted as having N chains and 2 draws which should print a warning of having more chains than draws. You can reshape the array to match ArviZ convention with:

idata = az.convert_to_inference_data(np.expand_dims(samples, 0))
# or what is the same (we just choose the name of the variable)
idata = az.from_dict({"position": np.expand_dims(samples, 0)})

which will generate a (1, N, 2) array whose dimensions will be understood by ArviZ. I have already added the conversion to InferenceData too as having an InferenceData will allow you to call any ArviZ function without having to care about dimensions any more.

If your array were (2, N), adding a transpose before expanding the axis should solve the problem:

idata = az.convert_to_inference_data(np.expand_dims(samples.T, 0))