I try to predict the outcome of soccer games based on the number of goals scored and I use the following model:
with pm.Model() as model:
# global model parameters
h = pm.Normal('h', mu = mu, tau = tau)
sd_a = pm.Gamma('sd_a', .1, .1)
sd_d = pm.Gamma('sd_d', .1, .1)
alpha = pm.Normal('alpha', mu=mu, tau = tau)
# team-specific model parameters
a_s = pm.Normal("a_s", mu=0, sd=sd_a, shape=n)
d_s = pm.Normal("d_s", mu=0, sd=sd_d, shape=n)
atts = pm.Deterministic('atts', a_s - tt.mean(a_s))
defs = pm.Deterministic('defs', d_s - tt.mean(d_s))
h_theta = tt.exp(alpha + h + atts[h_t] + defs[a_t])
a_theta = tt.exp(alpha + atts[a_t] + defs[h_t])
# likelihood of observed data
h_goals = pm.Poisson('h_goals', mu=h_theta, observed=observed_h_goals)
a_goals = pm.Poisson('a_goals', mu=a_theta, observed=observed_a_goals)
When I sample the model, the trace plots look fine.
Afterward when I want to calculate the WAIC:
waic = pm.waic(trace, model)
I get the following error:
----> 1 waic = pm.waic(trace, model)
~\Anaconda3\envs\env\lib\site-packages\pymc3\stats_init_.py in wrapped(*args, **kwargs)
22 )
23 kwargs[new] = kwargs.pop(old)
—> 24 return func(*args, **kwargs)
25
26 return wrapped
~\Anaconda3\envs\env\lib\site-packages\arviz\stats\stats.py in waic(data, pointwise, scale)
1176 “”"
1177 inference_data = convert_to_inference_data(data)
-> 1178 log_likelihood = _get_log_likelihood(inference_data)
1179 scale = rcParams[“stats.ic_scale”] if scale is None else scale.lower()
1180
~\Anaconda3\envs\env\lib\site-packages\arviz\stats\stats_utils.py in get_log_likelihood(idata, var_name)
403 var_names.remove(“lp”)
404 if len(var_names) > 1:
–> 405 raise TypeError(
406 “Found several log likelihood arrays {}, var_name cannot be None”.format(var_names)
407 )
TypeError: Found several log likelihood arrays [‘h_goals’, ‘a_goals’], var_name cannot be None
Is there any way to calculate WAIC and compare models when I have two likelihood functions in pymc3? (1: the goals scored by the home 2: the goals scored by the away team)
It is possible but requires defining what are you interested in predicting, it can be the result of the match, or could be the number of goals scored by either team (not the aggregate, each match would then provide 2 results to predict).
A complete and detailed answer is available at PyMC discourse.
Here I transcribe the case where the quantity of interest is the result of the match as a summary. ArviZ will automatically retrieve 2 pointwise log likelihood arrays, which we have to combine somehow (e.g. add, concatenate, groupby...) to get a single array. The tricky part is knowing which operation corresponds to each quantity, which has to be assessed on a per model basis. In this particular example, the predictive accuracy of a match result can be calculated in the following way:
dims = {
"home_points": ["match"],
"away_points": ["match"],
}
idata = az.from_pymc3(trace, dims=dims, model=model)
Setting the match
dim is important to tell xarray how to align the pointwise log likelihood arrays, otherwise they would not be broadcasted and aligned in the desired way.
idata.sample_stats["log_likelihood"] = (
idata.log_likelihood.home_points + idata.log_likelihood.away_points
)
az.waic(idata)
# Output
# Computed from 3000 by 60 log-likelihood matrix
#
# Estimate SE
# elpd_waic -551.28 37.96
# p_waic 46.16 -
#
# There has been a warning during the calculation. Please check the results.
Note that ArviZ>=0.7.0 is required.