pythonimage-processingmachine-learningneural-networkfeed-forward

How to increase accuracy of neural networks


I'm trying to build a simple neural network to classify product images to different labels (product types). i.e, given a new product image tell which product category type (books, toys, electronics etc.) it belongs to.

I have a couple of product images under each product number and each product number has a label (i.e., product type) in a excel sheet.

Below is my code:

from sklearn.preprocessing import LabelEncoder
from sklearn.cross_validation import train_test_split
from keras.models import Sequential
from keras.layers import Activation
from keras.optimizers import SGD
from keras.layers import Dense
from keras.utils import np_utils
from imutils import paths
import numpy as np
import argparse
import cv2
import os
import xlwt
import xlrd
import glob2
import pickle

def image_to_feature_vector(image, size=(32,32)):
    return cv2.resize(image, size).flatten()

def read_data(xls = "/Desktop/num_to_product_type.xlsx"):
    book = xlrd.open_workbook(xls)
    sheet = book.sheet_by_index(0)
    d = {}
    for row_index in xrange(1, sheet.nrows): # skip heading row
        prod_type, prod_num = sheet.row_values(row_index, end_colx=2)
        prod_type = unicode(prod_type).encode('UTF8')
        produ_num = unicode(prod_num).encode('UTF8')

        d[prod_num] = prod_type
    return d

def main():

    try:
        imagePaths=[]
        print("[INFO] describing images...")
        for path, subdirs, files in os.walk(r'/Desktop/data'):
            for filename in files:
                imagePaths.append(os.path.join(path, filename))

        files = glob2.glob('/Desktop/data/**/.DS_Store')
        for i in files:
            imagePaths.remove(i)  
    except:
        pass

    dd = read_data() 
    # initialize the data matrix and labels list
    data = []
    labels1 = []

    for (i, imagePath) in enumerate(imagePaths):
        image = cv2.imread(imagePath)
        #print(image.shape)
        subdir = imagePath.split('/')[-2]
        for k, v in dd.items():
            if k == subdir:
                label = v
                break

        features = image_to_feature_vector(image)
        data.append(features)
        labels1.append(label)


        # show an update every 1,000 images
        if i > 0 and i % 1000 == 0:
            print("[INFO] processed {}/{}".format(i, len(imagePaths)))
    print("String Labels")
    print(labels1)

    # encode the labels, converting them from strings to integers
    le = LabelEncoder()
    labels = le.fit_transform(labels1)
    print(labels) 

    d={}
    d[labels[0]] = labels1[0]

    for i in range(1,len(labels)-1):
        if labels[i-1] != labels[i] and labels[i] == labels[i+1]:
            d[labels[i]]  = labels1[i]

    data = np.array(data) / 255.0
    labels = np_utils.to_categorical(labels, 51)
    print("To_Categorical")
    print(labels) 

    print("[INFO] constructing training/testing split...")
    (trainData, testData, trainLabels, testLabels) = train_test_split(
        data, labels, test_size=0.25, random_state=42)

    model = Sequential()
    model.add(Dense(768, input_dim=3072, init="uniform",
        activation="relu"))
    model.add(Dense(384, init="uniform", activation="relu"))
    model.add(Dense(51))
    model.add(Activation("softmax"))

    print("[INFO] compiling model...")

    sgd = SGD(lr=0.125
              )
    model.compile(loss="categorical_crossentropy", optimizer=sgd,
        metrics=["accuracy"])
    model.fit(trainData, trainLabels, nb_epoch=50, batch_size=750)


#     #Test the model
    #show the accuracy on the testing set
     print("[INFO] evaluating on testing set...")
     (loss, accuracy) = model.evaluate(testData, testLabels,
         batch_size=128, verbose=1)
     print("[INFO] loss={:.4f}, accuracy: {:.4f}%".format(loss,
         accuracy * 100))

if __name__ == '__main__':

    main()

The neural network is a 3-2-3-51 feedforward neural network. Layer 0 contains 3 inputs. Layers 1 & 2 are hidden layers containing 2 & 3 nodes resp. Layer 3 is the output layer which has 51 nodes (i.e., for 51 product category type). However, with this I'm getting very low accuracy, only about 45-50%.

Is there something wrong that I'm doing? How do you increase the accuracy of the neural network? I read somewhere that it can be done by "crossvalidation and hyperparameter tuning" but how is it done? Sorry, I'm very new at neural network, just trying something new. Thanks.


Solution

  • For creating an image classifier in keras I would suggest trying a convolutional neural network as they tend to work much better for images. Also, normalizing between layers can help with accuracy during training which should help yield a better validation/test accuracy. (The same concept as normalizing data before training.)

    For a keras convolutional layer simply call model.add(Conv2D(params)) and to normalize between layers you can call model.add(BatchNormalization())

    Convolutional neural networks are more advanced but better suited for images. The difference being that a convolutional is at a high level just a "mini" neural network scanning over patches of the image. This is important because for example you can have the EXACT same object in two images, but if they are in different places in that image a normal neural network would view that as two different objects vs the same object in different places in the images...

    So this "mini" neural network that scans the image in patches (often referred to as the kernel size) is more inclined to pick up on similar features of objects. The object features are then trained into the network so even if the object is present in different areas of your images it can be more accurately recognized as the same thing. This is the key to why a convolutional neural network is better for working with images.

    Here is a basic example in keras 2 with normalization based off of an NVIDIA model architecture...

            model = Sequential()
            # crop the images to get rid of irrelevant features if needed...
            model.add(Cropping2D(cropping=((0, 0), (0,0)), input_shape=("your_input_shape tuple x,y,rgb_depth")))
            model.add(Lambda(lambda x: (x - 128) / 128)) # normalize all pixels to a mean of 0 +-1
            model.add(Conv2D(24, (2,2), strides=(2,2), padding='valid', activation='elu')) # 1st convolution
            model.add(BatchNormalization()) # normalize between layers
            model.add(Conv2D(36, (2,2), strides=(2,2), padding='valid', activation='elu')) # 2nd convolution
            model.add(BatchNormalization())
            model.add(Conv2D(48, (1,1), strides=(2,2), padding='valid', activation='elu')) # 3rd convolution
            model.add(BatchNormalization())
            # model.add(Conv2D(64, (3,3), strides=(1,1), padding='valid', activation='elu')) # 4th convolution
            # model.add(BatchNormalization())
            # model.add(Conv2D(64, (3,3), strides=(1,1), padding='valid', activation='elu')) # 4th convolution
            # model.add(BatchNormalization())
            model.add(Dropout(0.5))
            model.add(Flatten()) # flatten the dimensions
            model.add(Dense(100, activation='elu')) # 1st fully connected layer
            model.add(BatchNormalization())
            model.add(Dropout(0.5))
            model.add(Dense(51, activation= 'softmax')) # label output as probabilites
    

    Lastly, hyperparameter tuning is just adjusting batch sizes, epochs, learning rates etc to achieve the best result. All you can do there is experiment and see what works best.