nlpspeech-recognitionspeech-to-textstate-machinekaldi

If an FST transition is based on a given context, how can it be called as 'non deterministic'?


I am going through the paper 'SPEECH RECOGNITION WITH WEIGHTED FINITE-STATE TRANSDUCERS' (hbka.pdf - https://cs.nyu.edu/~mohri/pub/hbka.pdf)

At page 9, and figure 6

I do not understand how 6 a) figure be considered for 'non-determinism'.

[Here ae/k t represents the triphonic model for ae with left context k and right context t. This transition is considered as non-deterministic. Why?]

The differences between transitions that are termed 'non-deterministic' and 'deterministic' are

  1. Nondeterministic transitions allow (epsilon transitions) a transition to happen between states without any input symbol consumption and/or any output symbol generation.

This is especially useful where the transducer is supposed to output English word morphological description from the input English word, like in cities -> city -PL. Here there are transitions between states where you input and output the same alphabet. But when you input i or e you need not output anything and in the final transition for input s you would output y -PL

So, I understand the need for epsilon transitions. I also understand that non-deterministic transitions are when multiple transitions for same input label can exist. This could lead to ambiguity and hence supports the name 'non-deterministic'.

  1. Deterministic transitions are the very opposite of the above. No multiple transitions for same input label; hence every transition is unique. No epsilon transitions.

With this limited knowing, I could not decipher why a transition could ever be called 'non-deterministic' when you have provided the context. Here ae/k t represents the triphonic model for ae with left context k and right context t. This transition is considered as non-deterministic. Why?

The main idea of providing context is to remove the ambiguity.


Solution

  • The text to the right of the colon : is the output of the transition, not its input; so it doesn't help determine which transition to take, but rather, you need to take the right transition in order to get the right output.