I am trying to use the CTC loss function in my network, but don't quite understand when to feed the 'blank' label as a label.
I use it in gesture recognition as described byMolchanov, but what get's me confused that there is a 'no gesture' as well.
In tensorflow docs, it is described that
The inputs Tensor's innermost dimension size, num_classes, represents num_labels + 1 classes, where num_labels is the number of true labels, and the largest value (num_classes - 1) is reserved for the blank label.
If I now use the 'blank' label, to indicate that there is no gesture, I am limited in my training, because of the error
Saw a non-null label (index >= num_classes - 1) following a null label
I am assuming that null label is the same as the blank label.
The problem is, when I want to feed data that starts with no gesture (mapped to null label) and has then a gesture, I get exactly this error. I can avoid it by adding two more labels, one for 'no gesture' and one for 'blank label/null label' next to my existing labels. Then I only feed the 'no gesture' label but never the 'blank' label, but this doesn't seem quite right.
So my question is, what should I use the 'blank/null' label for?
I can imagine in language processing, you would use the sentence ending dot usually as the 'null' label? But there is no ending gesture as it is one continuous stream.
Thank you
EDIT I highly recommend reading this distill article. "The ϵ (blank) token doesn’t correspond to anything and is simply removed from the output." It is used to 'interrupt' the merging of repeating tokens.
The blank label serves as a transitioning state between two classes.
To answer my question itself, you don't assign the blank label to anything, but still have it as an existing class. In my case, I had added two more labels, one for the no gesture class and one for the blank.
(That's at least how I did it and got some decent results)