pythontensorflowkerasocrctc

how to save ocr model from keras author-A_K_Nain


Im studying tensorflow ocr model from keras example authored by A_K_Nain. This model use custom object (CTC Layer). It is in the site:https://keras.io/examples/vision/captcha_ocr/ I trained model using my dataset and then the result of prediction model is perfect. I want to save and load this model and i tried it. But i got some errors so i appended this code in CTC Layer class.

def get_config(self):
    config = super(CTCLayer, self).get_config()
    config.update({"name":self.name})
    return config

After that I tried to save whole model and weight but nothing worked. So i applied 2 save point. First way.

history = model.fit(
    train_dataset,
    validation_data=validation_dataset,
    epochs=70,
    callbacks=[early_stopping],
)

model.save('./model/my_model')

---------------------------------------

new_model = load_model('./model/my_model', custom_objects={'CTCLayer':CTCLayer})

prediction_model = keras.models.Model(
  new_model .get_layer(name='image').input, new_model .get_layer(name='dense2').output
)

and second way.

prediction_model = keras.models.Model(
  model.get_layer(name='image').input, model.get_layer(name='dense2').output
)

prediction_model.save('./model/my_model')

These still never worked. it didn't make error but result of prediction is terrible. Accurate results are obtained when training and saving and loading are performed together. If I load same model without training together, the result is so bad.

How can i use this model without training everytime? please help me.


Solution

  • The problem does not come from tensorflow. In the captcha_ocr tutorial, characters is a set, sets are unordered. So the mapping from characters to integers using StringLookup is dependent of the current run of the notebook. That is why you get rubbish when using it in another notebook without retraining, the mapping is not the same!
    A solution is to use an ordered list instead of the set for characters :

    characters = sorted(list(set([char for label in labels for char in label])))
    

    Note that the set operator here permits to get a unique version of each character and then it is converted back to a list and sorted. It will work then on any script/notebook without retraining (using the same formula).