I'm trying to use keras neural network of tensorflow to recognize the handwriting digit number. But idk why when i call predict()
, it returns same results for all of input images.
Here is code:
### Train dataset ###
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train/255
x_test = x_test/255
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(28,28)))
model.add(tf.keras.layers.Dense(units=128,activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(units=10,activation=tf.nn.softmax))
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
model.fit(x_train, y_train, epochs=5)
The result looks like this:
Epoch 1/5
1875/1875 [==============================] - 2s 672us/step - loss: 0.2620 - accuracy: 0.9248
Epoch 2/5
1875/1875 [==============================] - 1s 567us/step - loss: 0.1148 - accuracy: 0.9658
Epoch 3/5
1875/1875 [==============================] - 1s 559us/step - loss: 0.0784 - accuracy: 0.9764
Epoch 4/5
1875/1875 [==============================] - 1s 564us/step - loss: 0.0596 - accuracy: 0.9817
Epoch 5/5
1875/1875 [==============================] - 1s 567us/step - loss: 0.0462 - accuracy: 0.9859
Then the code to use image to test is below:
img = cv.imread('path/to/1.png')
img = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
img = cv.resize(img,(28,28))
img = np.array([img])
if cv.countNonZero((255-image)) == 0:
print('')
img = np.invert(img)
plt.imshow(img[0])
plt.show()
prediction = model.predict(img)
result = np.argmax(prediction)
print(prediction)
print(f'Result: {result}')
The result is:
[[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]
Result: 3
[[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]
Result: 3
Normalize your data in inference time same what you did on the training set
img = np.array([img]) / 255
Check this answer (Inference) for more details.
Based on your 3rd comment, here are some details.
def input_prepare(img):
img = cv2.resize(img, (28, 28))
img = cv2.bitwise_not(img)
img = tf.cast(tf.divide(img, 255) , tf.float64)
img = tf.expand_dims(img, axis=0)
return img
img = cv2.imread('/content/1.png')
orig = img.copy() # save for plotting later on
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # gray scaling
img = input_prepare(img)
plt.imshow(tf.reshape(img, shape=[28, 28]))
plt.imshow(cv2.cvtColor(orig, cv2.COLOR_BGR2RGB))
plt.title(np.argmax(model.predict(img)))
plt.show()
It works as expected. But because of resizing the image, the digits get broken and lose their spatial information. That seems ok for the model but if it gets much worse, then the model will predict wrong. A case examples
and the model predicts wrong for this.
plt.imshow(cv2.cvtColor(orig, cv2.COLOR_BGR2RGB))
plt.title(np.argmax(model.predict(img)))
plt.show()
To fix this we can apply cv2.erode
to add some pixel after resizing, for example
def input_prepare(img):
img = cv2.resize(img, (28, 28))
img = cv2.erode(img, np.ones((2, 2)))
img = cv2.bitwise_not(img)
img = tf.cast(tf.divide(img, 255) , tf.float64)
img = tf.expand_dims(img, axis=0)
return img
Not the best approach perhaps but now the model will understand better.
plt.imshow(cv2.cvtColor(orig, cv2.COLOR_BGR2RGB))
plt.title(np.argmax(model.predict(img)))
plt.show()