Using the great TensorFlow Hidden Markov Model library, it is straightforward to model the following Dynamic Bayesian Network:
where Hi is the probability variable that represents the HMM and Si is the probability variable that represents observations.
What if I'd like to make H depend on yet another HMM (Hierarchical HMM) or simply other probability variable like this:
The HiddenMarkovModel
definition in TensorFlow looks like the following:
tfp.distributions.HiddenMarkovModel(
initial_distribution, transition_distribution, observation_distribution,
num_steps, validate_args=False, allow_nan_stats=True,
time_varying_transition_distribution=False,
time_varying_observation_distribution=False, name='HiddenMarkovModel'
)
It only accepts initial, transition and observation distributions.
How could I model the above and pass additional probability variable distribution to the HiddenMarkovModel
? Is that possible by somehow incorporating C
into the transition_distribution
parameter?
Maybe C
should be treated as observation as well? (I'm not sure though, if that would be a full equivalent of the structure I'd like to model)
A simple example / explanation would be great to have.
UPDATE
I've tried building a simple joint distribution of two dependent variables and feed as transition_distribution into the HMM:
def mydist(y):
samples_length = 1 if tf.rank(y) == 0 else y.shape[0]
b = tf.ones([samples_length], dtype=tf.int32) - y
a = tf.reshape(y, [samples_length,1])
b = tf.reshape(b, [samples_length,1])
c = tf.concat([a, b], axis=1)
condprobs = tf.constant([ [0.1, 0.9], [0.5, 0.5] ])
d = tf.matmul(tf.cast(c, tf.float32), condprobs)
return tfd.Categorical(d, dtype=tf.int32)
jd = tfd.JointDistributionSequential([
tfd.Categorical(probs=[0.9, 0.1]),
lambda y: mydist(y)
], validate_args=True)
initial_distribution = tfd.Categorical(probs=[0.8, 0.2])
transition_distribution = tfd.Categorical(probs=[[0.7, 0.3],
[0.2, 0.8]])
observation_distribution = tfd.Normal(loc=[0., 15.], scale=[5., 10.])
model = tfd.HiddenMarkovModel(
initial_distribution=initial_distribution,
transition_distribution=jd,
observation_distribution=observation_distribution,
num_steps=7)
temps = [-2., 0., 2., 4., 6., 8., 10.]
model.posterior_mode(temps)
This gives an error:
ValueError: If the two shapes can not be broadcasted. AttributeError: 'list' object has no attribute 'ndims'
The HMM manual mentions:
This model assumes that the transition matrices are fixed over time.
And that transition_distribution must be
A Categorical-like instance. The rightmost batch dimension indexes the probability distribution of each hidden state conditioned on the previous hidden state.
which tfd.JointDistributionSequential is probably not.
Still looking for a ways of building hierarchical HMMs with TensorFlow.
The TFP HiddenMarkovModel
implements message passing algorithms for chain-structured graphs, so it can't natively handle the graph in which the C
s are additional latent variables. I can think of a few approaches:
Fold the C
s into the hidden state H
, blowing up the state size. (that is, if H
took values in 1, ..., N
and C
took values in 1, ..., M
, the new combined state would take values in 1, ..., NM
).
Model the chain conditioned on values for the C
s that are set by some approximate inference algorithm. For example, if the Cs are continuous, you could fit them using gradient-based VI or MCMC:
@tfd.JointDistributionCoroutineAutoBatched
def model():
Cs = yield tfd.Sample(SomePrior, num_timesteps)
Ss = yield tfd.HiddenMarkovModel(
...,
transition_distribution=SomeDistribution(Cs),
time_varying_transition_distribution=True)
# Fit Cs using gradient-based VI (could also use HMC).
pinned = tfp.experimental.distributions.JointDistributionPinned(model, Ss=observations)
surrogate_posterior = tfp.experimental.vi.build_factored_surrogate_posterior(
event_shape=pinned.event_shape,
bijector=pinned.experimental_default_event_space_bijector())
losses = tfp.vi.fit_surrogate_posterior(
target_log_prob_fn=pinned.unnormalized_log_prob,
surrogate_posterior=surrogate_posterior,
optimizer=tf.optimizers.Adam(0.1),
num_steps=200)
[
trajectories,
incremental_log_marginal_likelihoods
] = tfp.experimental.mcmc.infer_trajectories(
observations=observations,
initial_state_prior=tfd.JointDistributionSequential(
[PriorOnC(),
lambda c: mydist(c, previous_state=None)]),
transition_fn=lambda step, state: tfd.JointDistributionSequential(
[PriorOnC(),
lambda c: mydist(c, previous_state=state)]),
observation_fn=lambda step, state: observation_distribution[state[1]],
num_particles=4096)
This gives up exact inference over the discrete chain, but it's probably the most flexible approach for working with dynamic Bayesian networks in general.