tensorflow machine-learning keras image-recognition mnist

How to clean images to use with a MNIST trained model?

I am creating a machine learning model for classifying images of numbers. I have trained the model using Tensorflow and Keras using the inbuilt tf.keras.datasets.mnist dataset. The model works quite well with the test images from the mnist dataset itself but I would like to feed it images of my own. The images that I am feeding this model is extracted from a Captcha so they will follow a similar pattern. I have included some examples of the images in this public google drive folder. When I feed these images, I noticed that the model is not very accurate and I have some guesses as to why.

The background of the image creates too much noise in the picture.
The number is not centered.
The image is not striclty in the color format of MNIST training set (Black background white text).

I wanted to ask how can I remove the background and centre it so that the noise in the image is reduced allowing for better classifications.

Here is the model I am using:

import tensorflow as tf
from tensorflow import keras

mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

class Stopper(keras.callbacks.Callback):
    def on_epoch_end(self, epoch, log={}):
        if log.get('acc') >= 0.99:
            self.model.stop_training = True
            print('\nReached 99% Accuracy. Stopping Training...')

model = keras.Sequential([
    keras.layers.Flatten(),
    keras.layers.Dense(1024, activation=tf.nn.relu),
    keras.layers.Dense(10, activation=tf.nn.softmax)])

model.compile(
    optimizer=tf.train.AdamOptimizer(),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'])

x_train, x_test = x_train / 255, x_test / 255

model.fit(x_train, y_train, epochs=10, callbacks=[Stopper()])

And here is my method of importing the image into tensorflow:

from PIL import Image
img = Image.open("image_file_path").convert('L').resize((28, 28), Image.ANTIALIAS)
img = np.array(img)
model.predict(img[None,:,:])

I have also included some examples from the MNIST dataset here. I would like a script to convert my images as closely to the MNIST dataset format as possible. Also, since I would have to do this for an indefinite number of images, I would appreciate if you could provide a fully automated method for this conversion.

Solution

You need to train with a dataset similar to the images you're testing. The MNIST data is hand-written numbers, which is not going to be similar to the computer generated fonts for Captcha data.

What you need to do is gain a catalog of Captcha data similar to what you're predicting on (preferably from the same source you will be inputting to the final model). It's a painstaking task to capture the data, and you'll probably need around 300-400 images for each label before you start to get something useful.

A key note: your model will only ever be as good as the training data you supplied to the model. Trying to make a good model with bad training data is an effort in pure frustration

To address some of your thoughts:

[the model is not very accurate because] the background of the image creates too much noise in the picture.

This is true. If the image data has noise and the neural net was not trained using any noise in the images, then it will not recognize a strong pattern when it encounters this type of distortion. One possible way to combat this is to take clean images and progamatically add noise to the image (noise similar to what you see in the real Captcha) before sending it to be trained.

[the model is not very accurate because] The number is not centered.

Also true for the same reasons. If all the training data is centered, the model will be overtuned for this property and make incorrect guesses. Follow a similar pattern to the one above if you don't have the capacity to manually capture and catalog a good sampling of data.

[the model is not very accurate because] The image is not striclty in the color format of MNIST training set (Black background white text).

You can get around this by applying a binary threshold to the data before processing/ normalize the color input before training. Depending on the amount of noise in the captcha you may have better results allowing the number and noise to retain some of it's color information (still put in greyscale and normalize, just don't apply the threshold).

Additionally I'd recommend using a convolution net rather than the linear network as it is better at distinguishing 2D features like edges and corners. i.e. use keras.layers.Conv2D layers before flattening with keras.layers.Flatten

See the great example found here: Trains a simple convnet on the MNIST dataset.

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Conv2D(
            32,
            kernel_size=(3, 3),
            activation=tf.nn.relu,
            input_shape=input_shape,
        ),
        tf.keras.layers.Conv2D(64, (3, 3), activation=tf.nn.relu),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Dropout(0.25),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation=tf.nn.relu),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(
            num_classes, activation=tf.nn.softmax
        ),
    ]
)

I've used this setup for reading fonts in video gameplay footage, and with a test set of 10,000 images I'm achieving 99.98% accuracy, using a random sampling of half the dataset in training, and calculating accuracy using the total set.