Given the logits (output from the RNN/Lstm/Gru in time major format i.e. (maxTime, batchSize, numberofClasses)), how does ctc greedy decoder performs decoding to generate output sequence.
I found this "Performs greedy decoding on the logits given in input (best path)" on its webpage https://www.tensorflow.org/api_docs/python/tf/nn/ctc_greedy_decoder.
One possibility is to select output class with maximum value at each time step, collapse repetitions and generate corresponding output sequence. Is it, ctc greedy decoder doing here or something else? Explanation using an example will be very useful.
The operation ctc_greedy_decoder implements best path decoding, which is also stated in the TF source code [1].
Decoding is done in two steps:
Let's look at an example. The neural network outputs a matrix with 5 time-steps and 3 characters ("a", "b" and the blank "-"). We take the most likely character per time-step, which gives us the best path: "aaa-b". Then, we remove repeated characters and get "a-b". Finally, we remove all blanks and get "ab" as the result.
More information about CTC can be found in [2] and an example on how to use it in Python is shown in [3].
[1] Implementation of ctc_greedy_decoder: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/util/ctc/ctc_decoder.h#L96
[2] Further information about CTC, best path decoding and beam search decoding: https://harald-scheidl.medium.com/beam-search-decoding-in-ctc-trained-neural-networks-5a889a3d85a7
[3] Sample code which shows how to use ctc_greedy_decoder: https://github.com/githubharald/SimpleHTR/blob/master/src/model.py#L129