ocrlstmrecurrent-neural-networkctc

Can RNN / LSTM be used for non standard text OCR?


I have read about LSTMs and RNNs, even CTC. From what I understand, RNN is used to figure a missing token in a sequence (e.g. a word in a sentence). However, my problem is reading person names written in cursive script. Many names are not popular and cannot be found in a language model, so if RNN is only predicting missing word, it will not succeed as I don't have a full dataset of possible person names, right?

1) Can I use RNN for recognizing non standard words ? (e.g. rare/unpopular names of persons)

2) If no, is there any other alt. ? Or I must use the traditional approach of OCR (to try to segment then classify characters)?


Solution

  • Neural networks built out of CNN + RNN + CTC work on character-level. They learn to predict character strings and don't care about words or the underlying language. You can of course integrate a dictionary and/or language model into the CTC decoder, but you don't have to. This way, such networks can read arbitrary person names just by looking at the characters. For a high-level introduction on text recognition with such neural network models, see https://towardsdatascience.com/2326a3487cd5

    Just one additional note: RNNs are used to propagate information along the sequence, e.g. to figure out what an ambiguous looking character might be depending on its surrounding.