pythonpython-xarraypymc3arviz

Plot fit of gamma distribution with pymc3


Suppose that I generate some sample data using pymc3 for a gamma distribution:

import pymc3 as pm
import arviz as az

# generate fake data:
with pm.Model() as model2:
    g = pm.Gamma('g', alpha=1.7, beta=0.097)
    
syn = g.random(size=1000)
plt.hist(syn, bins=50);

enter image description here

Now, I will create a model to fit a gamma distribution on that data:

model = pm.Model()

with model: 

    # alpha
    alpha = pm.Exponential('alpha', lam=2)

    # beta
    beta = pm.Exponential('beta', lam=0.1)

    g = pm.Gamma('g', alpha=alpha, beta=beta, observed=syn)

    trace = pm.sample(2000, return_inferencedata=True)

This will correctly get the values and distribution that created the original fake data. Now, I want to plot the pdf (but I don't know how to do that!). I saw an example that did this:

with model:
    post_pred = pm.sample_posterior_predictive(trace.posterior)
# add posterior predictive to the InferenceData
az.concat(trace, az.from_pymc3(posterior_predictive=post_pred), inplace=True)

which creates a matrix that contains samples from the estimated pdfs. I plot the results with:

fig, ax = plt.subplots()
az.plot_ppc(trace, ax=ax)
ax.hist(syn, bins=100, alpha=.3, density=True, label='data')
ax.legend(fontsize=10);
plt.xlim([0,60])

which gives:

enter image description here

which is not what I'm looking for. Instead, I'd like to sample from the posterior of alpha and beta to draw many gamma pdfs. I can do that by sampling and plotting lines, but I thought this has to be something that is already implemented with pymc3 or arviz, but I just don't know it. Thanks in advance if you could tell me how to plot what I want.


Solution

  • For this particular task, I'd recommend combining xarray (ArviZ's InferenceData is based on xarray Datasets) and scipy to generate the pdfs.

    If using the right dimensions so that everything broadcasts, scipy.stats.gamma.pdf can be used to generate the pdfs for specific values of alpha and beta. Given that the posterior is stored as an xarray Dataset, we can use xarray.apply_ufunc to handle the broadcasting so we can use scipy to generate the pdfs to plot.

    The first step is to store the xrange as an xarray object , otherwise xarray won't know how to correctly broadcast. The second is to generate the pdfs using apply_ufunc. Note that here I am generating pdfs for every single draw, below there is also a way to select a random subset.

    import scipy.stats as stats
    import xarray as xr
    
    xrange = xr.DataArray(np.linspace(0, 90, 100), dims="x")
    xr.apply_ufunc(
        lambda alpha, beta, x: stats.gamma(a=alpha, scale=1/beta).pdf(x),
        trace.posterior["alpha"], 
        trace.posterior["beta"], 
        xrange
    )
    

    To quickly plot only the pdfs corresponding to a subset of the draws there are several alternatives, here is one possibility using the idea above.

    # get random subset of the posterior
    rng = np.random.default_rng()
    idx = rng.choice(trace.posterior.alpha.size, 200)
    post = trace.posterior.stack(sample=("chain", "draw")).isel(sample=idx)
    pdfs = xr.apply_ufunc(
        lambda alpha, beta, x: stats.gamma(a=alpha, scale=1/beta).pdf(x),
        post["alpha"], post["beta"], xrange,
    )
    # plot results, for proper plotting, "x" dim must be the first
    plt.plot(xrange, pdfs.transpose("x", ...));
    

    enter image description here