tensorflowhidden-markov-modelstensorflow-probability

TensorFlow Hidden Markov Model with more complex structure


Using the great TensorFlow Hidden Markov Model library, it is straightforward to model the following Dynamic Bayesian Network:

enter image description here

where Hi is the probability variable that represents the HMM and Si is the probability variable that represents observations.

What if I'd like to make H depend on yet another HMM (Hierarchical HMM) or simply other probability variable like this:

enter image description here

The HiddenMarkovModel definition in TensorFlow looks like the following:

tfp.distributions.HiddenMarkovModel(
    initial_distribution, transition_distribution, observation_distribution,
    num_steps, validate_args=False, allow_nan_stats=True,
    time_varying_transition_distribution=False,
    time_varying_observation_distribution=False, name='HiddenMarkovModel'
)

It only accepts initial, transition and observation distributions.

How could I model the above and pass additional probability variable distribution to the HiddenMarkovModel? Is that possible by somehow incorporating C into the transition_distribution parameter? Maybe C should be treated as observation as well? (I'm not sure though, if that would be a full equivalent of the structure I'd like to model)

A simple example / explanation would be great to have.

UPDATE

I've tried building a simple joint distribution of two dependent variables and feed as transition_distribution into the HMM:

def mydist(y):
  samples_length = 1 if tf.rank(y) == 0 else y.shape[0]
  b = tf.ones([samples_length], dtype=tf.int32) - y
  a = tf.reshape(y, [samples_length,1])
  b = tf.reshape(b, [samples_length,1])
  c = tf.concat([a, b], axis=1)

  condprobs = tf.constant([ [0.1, 0.9], [0.5, 0.5] ])
  d = tf.matmul(tf.cast(c, tf.float32), condprobs)
  return tfd.Categorical(d, dtype=tf.int32)

jd = tfd.JointDistributionSequential([
            tfd.Categorical(probs=[0.9, 0.1]),
  lambda y: mydist(y)
], validate_args=True)


initial_distribution = tfd.Categorical(probs=[0.8, 0.2])

transition_distribution = tfd.Categorical(probs=[[0.7, 0.3],
                                                 [0.2, 0.8]])

observation_distribution = tfd.Normal(loc=[0., 15.], scale=[5., 10.])

model = tfd.HiddenMarkovModel(
    initial_distribution=initial_distribution,
    transition_distribution=jd,
    observation_distribution=observation_distribution,
    num_steps=7)

temps = [-2., 0., 2., 4., 6., 8., 10.]

model.posterior_mode(temps)

This gives an error:

ValueError: If the two shapes can not be broadcasted. AttributeError: 'list' object has no attribute 'ndims'

The HMM manual mentions:

This model assumes that the transition matrices are fixed over time.

And that transition_distribution must be

A Categorical-like instance. The rightmost batch dimension indexes the probability distribution of each hidden state conditioned on the previous hidden state.

which tfd.JointDistributionSequential is probably not.

Still looking for a ways of building hierarchical HMMs with TensorFlow.


Solution

  • The TFP HiddenMarkovModel implements message passing algorithms for chain-structured graphs, so it can't natively handle the graph in which the Cs are additional latent variables. I can think of a few approaches:

    1. Fold the Cs into the hidden state H, blowing up the state size. (that is, if H took values in 1, ..., N and C took values in 1, ..., M, the new combined state would take values in 1, ..., NM).

    2. Model the chain conditioned on values for the Cs that are set by some approximate inference algorithm. For example, if the Cs are continuous, you could fit them using gradient-based VI or MCMC:

    @tfd.JointDistributionCoroutineAutoBatched
    def model():
      Cs = yield tfd.Sample(SomePrior, num_timesteps)
      Ss = yield tfd.HiddenMarkovModel(
        ..., 
        transition_distribution=SomeDistribution(Cs), 
        time_varying_transition_distribution=True)
    
    
    # Fit Cs using gradient-based VI (could also use HMC). 
    pinned = tfp.experimental.distributions.JointDistributionPinned(model, Ss=observations)
    surrogate_posterior = tfp.experimental.vi.build_factored_surrogate_posterior(
      event_shape=pinned.event_shape,
      bijector=pinned.experimental_default_event_space_bijector())
    losses = tfp.vi.fit_surrogate_posterior(
      target_log_prob_fn=pinned.unnormalized_log_prob,
      surrogate_posterior=surrogate_posterior,
      optimizer=tf.optimizers.Adam(0.1),
      num_steps=200)
    
    1. Use a particle filter, which can handle arbitrary joint distributions and dependencies in the transition and observation models:
    [ 
      trajectories,
      incremental_log_marginal_likelihoods
    ] = tfp.experimental.mcmc.infer_trajectories(
        observations=observations,
        initial_state_prior=tfd.JointDistributionSequential(
          [PriorOnC(),
           lambda c: mydist(c, previous_state=None)]),
        transition_fn=lambda step, state: tfd.JointDistributionSequential(
          [PriorOnC(),
           lambda c: mydist(c, previous_state=state)]),
        observation_fn=lambda step, state: observation_distribution[state[1]],
        num_particles=4096)
    

    This gives up exact inference over the discrete chain, but it's probably the most flexible approach for working with dynamic Bayesian networks in general.