python-2.7pymc3

pm.Normal() in Python PyMC3


I was working on a simple Bayesian linear regression using PyMC3 in python. In defining the likelihood function I came across this syntax.

likelihood = pm.Normal('Y', mu=intercept + x_coeff * df['x'],sd=sigma, observed=df['y'])

In the parameters for pm.Normal(), what does the "observed = " do? Please explain with examples if possible.


Solution

  • observed means that the value of the linear regression's response variable (typically named "y", but here confusingly named likelihood) is known (through observation) to be equal to df[y].

    When the inference algorithm is run, values df['y'] will be used to determine the likely values of stochastic variables intercept and x_coeff that would have caused them. To do that, it uses the causal relationship between them, namely that the observed variable is Normally-distributed with mean equal to intercept + x_coeff*df['x'] and standard deviation sigma.

    Note that df['y'] is typically an array with multiple observations. So the algorithm will try to infer the distributions of intercept and x_coeff likely to have induced these multiple observations df['y'].

    Note that the algorithm will not infer the values for df['x'] since that is also fixed, observed data.

    I mentioned the variable was confusingly named likelihood instead of y. That is because pm.Normal does create a stochastic variable object, not a real-valued likelihood. I believe the reason this name was chosen was tradition, because the observed values define a likelihood that is internally used by the inference algorithm to infer the distributions for the other stochastic variables.

    In fact, in the PyMC introduction we see a similar definition using the name Y_obs instead:

    Y_obs = pm.Normal("Y_obs", mu=mu, sigma=sigma, observed=Y)