I am using a model trained by myself to translate braille digits into plain text. As you can see this is a classification problem with 26 classes, one for each letter in the alphabet.
This is the dataset that I used to train my model: https://www.kaggle.com/datasets/shanks0465/braille-character-dataset
This is how I am generating my training and validation set:
os.mkdir('./images/')
alpha = 'a'
for i in range(0, 26):
os.mkdir('./images/' + alpha)
alpha = chr(ord(alpha) + 1)
rootdir = "C:\\Users\\ffernandez\\Downloads\\capstoneProject\\Braille Dataset\\Braille Dataset\\"
for file in os.listdir(rootdir):
letter = file[0]
copyfile(rootdir+file, './images/' + letter + '/' + file)
The resulting folder looks like this: folder structure
And this is how I create the train and validation split:
datagen = ImageDataGenerator(rotation_range=20,
shear_range=10,
validation_split=0.2)
train_generator = datagen.flow_from_directory('./images/',
target_size=(28,28),
subset='training')
val_generator = datagen.flow_from_directory('./images/',
target_size=(28,28),
subset='validation')
Finally this is the code corresponding to the design, compilation and training of the model:
K.clear_session()
model_ckpt = ModelCheckpoint('BrailleNet.h5',save_best_only=True)
reduce_lr = ReduceLROnPlateau(patience=8,verbose=0)
early_stop = EarlyStopping(patience=15,verbose=1)
entry = L.Input(shape=(28,28,3))
x = L.SeparableConv2D(64,(3,3),activation='relu')(entry)
x = L.MaxPooling2D((2,2))(x)
x = L.SeparableConv2D(128,(3,3),activation='relu')(x)
x = L.MaxPooling2D((2,2))(x)
x = L.SeparableConv2D(256,(2,2),activation='relu')(x)
x = L.GlobalMaxPooling2D()(x)
x = L.Dense(256)(x)
x = L.LeakyReLU()(x)
x = L.Dense(64,kernel_regularizer=l2(2e-4))(x)
x = L.LeakyReLU()(x)
x = L.Dense(26,activation='softmax')(x)
model = Model(entry,x)
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
history = model.fit_generator(train_generator,validation_data=val_generator,epochs=666,
callbacks=[model_ckpt,reduce_lr,early_stop],verbose=0)
Then this is the code for testing an image of the letter 'a' in braille has the same size as the training and validation set (28x28):
img_path = "./test/a1.JPG10whs.jpg"
img = plt.imread(img_path)
img_array = tf.keras.utils.img_to_array(img)
img_batch = np.expand_dims(img_array, axis=0)
img_preprocessed = tf.keras.applications.resnet50.preprocess_input(img_batch)
prediction = model.predict(img_preprocessed)
print(tf.keras.applications.imagenet_utils.decode_predictions(prediction, top=3)[0])
Just when I execute that last line of code this error appears:
ValueError: decode_predictions
expects a batch of predictions (i.e. a 2D array of shape (samples, 1000)). Found array with shape: (1, 26)
A similar question I found here on stackoverflow (ValueError: `decode_predictions` expects a batch of predictions (i.e. a 2D array of shape (samples, 1000)). Found array with shape: (1, 7)).
I've seen that using "decode_predictions" only makes sense if your model outputs the ImageNet classes (1000-dimensional) but if I can't use "decode_predictions" I don't know how to get my predictions.
My desired output would be like:
prediction = model.predict(img_preprocessed)
print(prediction)
output: 'a'
Any hint or suggestion on how to solve this issue is highly appreciated.
If we take a look at what the prediction object acually is we can see that it has 26 values. These values are the propabiity for each letter that the model predicts:
So we need a way to map the prediction value to the respective letter. A simple way to do this could to create a list of all the 26 possible letters and search the max value in the prediction array. Example:
#Create prediction labels from a-z
alpha="a"
labels=["a"]
for i in range(0, 25):
alpha = chr(ord(alpha) + 1)
labels.append(alpha)
#Search the max value in prediction
labels[np.argmax(prediction)]
The output should be the character with the highest probability: