pythonmachine-learningkerasconv-neural-networkpre-trained-model

ValueError: `decode_predictions` expects a batch of predictions (i.e. a 2D array of shape (samples, 1000)). Found array with shape: (1, 26)


I am using a model trained by myself to translate braille digits into plain text. As you can see this is a classification problem with 26 classes, one for each letter in the alphabet.

This is the dataset that I used to train my model: https://www.kaggle.com/datasets/shanks0465/braille-character-dataset

This is how I am generating my training and validation set:

os.mkdir('./images/')
alpha = 'a'
for i in range(0, 26): 
    os.mkdir('./images/' + alpha)
    alpha = chr(ord(alpha) + 1)

rootdir = "C:\\Users\\ffernandez\\Downloads\\capstoneProject\\Braille Dataset\\Braille Dataset\\"

for file in os.listdir(rootdir):
    letter = file[0]
    copyfile(rootdir+file, './images/' + letter + '/' + file)    

The resulting folder looks like this: folder structure

And this is how I create the train and validation split:

datagen = ImageDataGenerator(rotation_range=20,
                             shear_range=10,
                             validation_split=0.2)

train_generator = datagen.flow_from_directory('./images/',
                                              target_size=(28,28),
                                              subset='training')

val_generator = datagen.flow_from_directory('./images/',
                                            target_size=(28,28),
                                            subset='validation')

Finally this is the code corresponding to the design, compilation and training of the model:

K.clear_session()

model_ckpt = ModelCheckpoint('BrailleNet.h5',save_best_only=True)
reduce_lr = ReduceLROnPlateau(patience=8,verbose=0)
early_stop = EarlyStopping(patience=15,verbose=1)

entry = L.Input(shape=(28,28,3))
x = L.SeparableConv2D(64,(3,3),activation='relu')(entry)
x = L.MaxPooling2D((2,2))(x)
x = L.SeparableConv2D(128,(3,3),activation='relu')(x)
x = L.MaxPooling2D((2,2))(x)
x = L.SeparableConv2D(256,(2,2),activation='relu')(x)
x = L.GlobalMaxPooling2D()(x)
x = L.Dense(256)(x)
x = L.LeakyReLU()(x)
x = L.Dense(64,kernel_regularizer=l2(2e-4))(x)
x = L.LeakyReLU()(x)
x = L.Dense(26,activation='softmax')(x)

model = Model(entry,x)
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])

history = model.fit_generator(train_generator,validation_data=val_generator,epochs=666,
                              callbacks=[model_ckpt,reduce_lr,early_stop],verbose=0)

Then this is the code for testing an image of the letter 'a' in braille has the same size as the training and validation set (28x28):

img_path = "./test/a1.JPG10whs.jpg"
img = plt.imread(img_path)
img_array = tf.keras.utils.img_to_array(img)
img_batch = np.expand_dims(img_array, axis=0)

img_preprocessed = tf.keras.applications.resnet50.preprocess_input(img_batch)
prediction = model.predict(img_preprocessed)

print(tf.keras.applications.imagenet_utils.decode_predictions(prediction, top=3)[0])

Just when I execute that last line of code this error appears:

ValueError: decode_predictions expects a batch of predictions (i.e. a 2D array of shape (samples, 1000)). Found array with shape: (1, 26)

A similar question I found here on stackoverflow (ValueError: `decode_predictions` expects a batch of predictions (i.e. a 2D array of shape (samples, 1000)). Found array with shape: (1, 7)).

I've seen that using "decode_predictions" only makes sense if your model outputs the ImageNet classes (1000-dimensional) but if I can't use "decode_predictions" I don't know how to get my predictions.

My desired output would be like:

prediction = model.predict(img_preprocessed)
print(prediction)

output: 'a'

Any hint or suggestion on how to solve this issue is highly appreciated.


Solution

  • If we take a look at what the prediction object acually is we can see that it has 26 values. These values are the propabiity for each letter that the model predicts: enter image description here

    So we need a way to map the prediction value to the respective letter. A simple way to do this could to create a list of all the 26 possible letters and search the max value in the prediction array. Example:

    #Create prediction labels from a-z
    alpha="a"
    labels=["a"]
    for i in range(0, 25): 
        alpha = chr(ord(alpha) + 1)
        labels.append(alpha)
    #Search the max value in prediction
    labels[np.argmax(prediction)]
    

    The output should be the character with the highest probability:

    enter image description here