This is essentially the "Multiple Coins from Multiple Mints / Baseball Players" example from Doing Bayesian Data Analysis, Second Edition (DBDA2). I believe I have PyMC3 code which is functionally equivalent, but one works and the other does not. This is with PyMC version 3.5. In more detail,
Let's say I have the following data. Each row is an observation:
observations_dict = {
'mint': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
'coin': [0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 6, 6, 7],
'outcome': [1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1]
}
observations = pd.DataFrame(observations_dict)
observations
The below, which implements DBDA2 Figure 9.7, runs just fine:
num_coins = observations['coin'].nunique()
coin_idx = observations['coin']
with pm.Model() as hierarchical_model:
# mint is characterized by omega and kappa
omega = pm.Beta('omega', 1., 1.)
kappa_minus2 = pm.Gamma('kappa_minus2', 0.01, 0.01)
kappa = pm.Deterministic('kappa', kappa_minus2 + 2)
# each coin is described by a theta
theta = pm.Beta('theta', alpha=omega*(kappa-2)+1, beta=(1-omega)*(kappa-2)+1, shape=num_coins)
# define the likelihood
y = pm.Bernoulli('y', theta[coin_idx], observed=observations['outcome'])
However, once this is turned into a hierarchical model (as seen in DBDA2 Figure 9.13):
num_mints = observations['mint'].nunique()
mint_idx = observations['mint']
num_coins = observations['coin'].nunique()
coin_idx = observations['coin']
with pm.Model() as hierarchical_model2:
# Hyper parameters
omega = pm.Beta('omega', 1, 1)
kappa_minus2 = pm.Gamma('kappa_minus2', 0.01, 0.01)
kappa = pm.Deterministic('kappa', kappa_minus2 + 2)
# Parameters for mints
omega_c = pm.Beta('omega_c',
omega*(kappa-2)+1, (1-omega)*(kappa-2)+1,
shape = num_mints)
kappa_c_minus2 = pm.Gamma('kappa_c_minus2',
0.01, 0.01,
shape = num_mints)
kappa_c = pm.Deterministic('kappa_c', kappa_c_minus2 + 2)
# Parameters for coins
theta = pm.Beta('theta',
omega_c[mint_idx]*(kappa_c[mint_idx]-2)+1,
(1-omega_c[mint_idx])*(kappa_c[mint_idx]-2)+1,
shape = num_coins)
y2 = pm.Bernoulli('y2', p=theta[coin_idx], observed=observations['outcome'])
The error is:
ValueError: operands could not be broadcast together with shapes (8,) (20,)
as the model has 8 thetas for 8 coins but sees 20 rows of data.
However, if the data is grouped such that each line represents the final statistics of an individual coin, as with the following
grouped = observations.groupby(['mint', 'coin']).agg({'outcome': [np.sum, np.size]}).reset_index()
grouped.columns = ['mint', 'coin', 'heads', 'total']
And the final likelihood variable is changed to a Binomial, as follows
num_mints = grouped['mint'].nunique()
mint_idx = grouped['mint']
num_coins = grouped['coin'].nunique()
coin_idx = grouped['coin']
with pm.Model() as hierarchical_model2:
# Hyper parameters
omega = pm.Beta('omega', 1, 1)
kappa_minus2 = pm.Gamma('kappa_minus2', 0.01, 0.01)
kappa = pm.Deterministic('kappa', kappa_minus2 + 2)
# Parameters for mints
omega_c = pm.Beta('omega_c',
omega*(kappa-2)+1, (1-omega)*(kappa-2)+1,
shape = num_mints)
kappa_c_minus2 = pm.Gamma('kappa_c_minus2',
0.01, 0.01,
shape = num_mints)
kappa_c = pm.Deterministic('kappa_c', kappa_c_minus2 + 2)
# Parameter for coins
theta = pm.Beta('theta',
omega_c[mint_idx]*(kappa_c[mint_idx]-2)+1,
(1-omega_c[mint_idx])*(kappa_c[mint_idx]-2)+1,
shape = num_coins)
y2 = pm.Binomial('y2', n=grouped['total'], p=theta, observed=grouped['heads'])
Everything works. Now, the latter form is more efficient and generally preferred, but I believe the former should work as well. So I believe this is primarily a PyMC3 issue (or even more likely, a user error).
To quote DBDA Edition 1,
"The BUGS model uses a binomial likelihood distribution for total correct, instead of using the Bernoulli distribution for individual trials. This use of the binomial is just a convenience for shortening the program. If the data were specified as trial-by-trial outcomes instead of as total correct, then the model could include a trial-by-trial loop and use a Bernoulli likelihood function"
What bothers me is that in the very first example (One Mint, Several Coins), it looks like PyMC3 can handle individual observations instead of aggregated observations just fine. So I believe the first form should work, but doesn't.
http://nbviewer.jupyter.org/github/JWarmenhoven/DBDA-python/blob/master/Notebooks/Chapter%209.ipynb
PyMC3 - Differences in ways observations are passed to model -> difference in results?
http://www.databozo.com/deep-in-the-weeds-complex-hierarchical-models-in-pymc3
https://stats.stackexchange.com/questions/157521/is-this-correct-hierarchical-bernoulli-model
The length of mint_idx
was 20 (one for each observation), but it should have been 8 (one for each coin).
Working answer, notice the mint_idx
recalculation (rest remains the same):
grouped = observations.groupby(['mint', 'coin']).agg({'outcome': [np.sum, np.size]}).reset_index()
grouped.columns = ['mint', 'coin', 'heads', 'total']
num_mints = grouped['mint'].nunique()
mint_idx = grouped['mint']
num_coins = observations['coin'].nunique()
coin_idx = observations['coin']
with pm.Model() as hierarchical_model2:
# Hyper parameters
omega = pm.Beta('omega', 1, 1)
kappa_minus2 = pm.Gamma('kappa_minus2', 0.01, 0.01)
kappa = pm.Deterministic('kappa', kappa_minus2 + 2)
# Parameters for mints
omega_c = pm.Beta('omega_c',
omega*(kappa-2)+1, (1-omega)*(kappa-2)+1,
shape = num_mints)
kappa_c_minus2 = pm.Gamma('kappa_c_minus2',
0.01, 0.01,
shape = num_mints)
kappa_c = pm.Deterministic('kappa_c', kappa_c_minus2 + 2)
# Parameters for coins
theta = pm.Beta('theta',
omega_c[mint_idx]*(kappa_c[mint_idx]-2)+1,
(1-omega_c[mint_idx])*(kappa_c[mint_idx]-2)+1,
shape = num_coins)
y2 = pm.Bernoulli('y2', p=theta[coin_idx], observed=observations['outcome'])
Many thanks to @junpenglao!! https://discourse.pymc.io/t/why-cant-i-use-a-bernoulli-as-a-likelihood-variable-in-a-hierarchical-model-in-pymc3/2022/2