python-3.xtensorflowmultiplication

Tensorflow is unable to train to predict simple multiplication


As a starting point i am trying to create a neural network that predicts simple multiplication, with the goal to change the formula at the later date. My initial thought was that it would be trivial to do, and the code itself is quite simple, but the model is not trainable at all. It learns nearly nothing, and its predictions are not even beginning to approach anything reasonable.

The code is

import tensorflow as tf

import numpy as np
import random

import os.path


def compileModel():
    model = tf.keras.Sequential()
    model.add( tf.keras.layers.InputLayer(shape=(2,)) )
    model.add( tf.keras.layers.Dense(1024) )
    model.add( tf.keras.layers.Dense(1024) )
    model.add( tf.keras.layers.Dense(units=1) )
    
    model.compile(loss="mean_squared_error", optimizer="adam", metrics=["mse"])
    
    return model

def generateData(num):
    vals = np.zeros((num, 2))
    res = np.zeros(num)
    for i in range(num):
        a = random.uniform(-100,100)
        b = random.uniform(-100,100)
        c = a*b
        
        vals[i][0] = a
        vals[i][1] = b
        res[i] = c
    return (vals, res)


modelFilename = 'saved_model.keras'
runTraining = False

if not runTraining and not os.path.isfile(modelFilename):
    runTraining = True

if runTraining:
    trainData, trainRes = generateData(100000)
    
    model = compileModel()
    
    model.fit(x=trainData, y=trainRes, epochs=12, batch_size=100)
    model.save(filepath=modelFilename)
else:
    model = tf.keras.models.load_model(modelFilename)


testData, testRes = generateData(1000)
model.evaluate(x=testData, y=testRes)

print(testData.shape)

res = model.predict(testData[0:1])
print(testData[0:1])
print(testRes[0:1])
print(res)

In here i am generating two numbers from -100 to 100 as the input and simply multiply them to get the correct answer. Everything runs, and it appears to go through epochs, but then nothing useful is predicted.

My guess would be that i have made a mistake in setting up the model itself. Here i am using two dense layers with 1024 connection points. I have tried playing with the number of layers and with the number of connections, but all that does is increase or decrease the time it takes for the model to train.


Solution

  • I think part of what might really be hurting you here is the wide range inputs, these models seem to perform best when values are in the neighborhood of 0 to 1 in my experience. Adding a normalization to the input and denormalization to the output may help quite a bit.

    Another factor that is likely hurting the performance is the activation function on your dense layers is linear, which isn't great. I reran your same code with the activation function set to 'relu' for your dense layers and got approx 1% error on the output.

    Here is the code I tested with (which has only minimal alterations):

    import tensorflow as tf
    
    import numpy as np
    import random
    
    import os.path
    
    
    def compileModel():
        model = tf.keras.Sequential()
        model.add(tf.keras.layers.InputLayer(shape=(2,)))
        model.add(tf.keras.layers.Dense(1024, activation='relu'))
        model.add(tf.keras.layers.Dense(1024, activation='relu'))
        model.add(tf.keras.layers.Dense(units=1))
    
        model.compile(loss="mean_squared_error", optimizer="adam", metrics=["mse"])
    
        return model
    
    
    def generateData(num):
        vals = np.zeros((num, 2))
        res = np.zeros(num)
        for i in range(num):
            a = random.uniform(-100, 100)
            b = random.uniform(-100, 100)
            c = a * b
    
            vals[i][0] = a
            vals[i][1] = b
            res[i] = c
        return (vals, res)
    
    
    modelFilename = 'saved_model.keras'
    runTraining = False
    
    if not runTraining and not os.path.isfile(modelFilename):
        runTraining = True
    
    if runTraining:
        trainData, trainRes = generateData(100000)
    
        model = compileModel()
    
        model.fit(x=trainData, y=trainRes, epochs=12, batch_size=100)
        model.save(filepath=modelFilename)
    else:
        model = tf.keras.models.load_model(modelFilename)
    
    testData, testRes = generateData(1000)
    model.evaluate(x=testData, y=testRes)
    
    print(testData.shape)
    
    res = model.predict(testData[0:1])
    print(testData[0:1])
    print(testRes[0:1])
    print(res)
    
    

    And here is the output it generated:

    Epoch 1/12
    1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - loss: 1773706.0000 - mse: 1773706.1250
    Epoch 2/12
    1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 409373.8438 - mse: 409373.8438
    Epoch 3/12
    1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 115835.0000 - mse: 115835.0000
    Epoch 4/12
    1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 13953.7402 - mse: 13953.7402
    Epoch 5/12
    1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 4830.2773 - mse: 4830.2773
    Epoch 6/12
    1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 2523.4836 - mse: 2523.4836
    Epoch 7/12
    1000/1000 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - loss: 1607.8860 - mse: 1607.8860
    Epoch 8/12
    1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 1378.7347 - mse: 1378.7347
    Epoch 9/12
    1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 1488.6558 - mse: 1488.6558
    Epoch 10/12
    1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 1009.2731 - mse: 1009.2731
    Epoch 11/12
    1000/1000 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - loss: 1667.1349 - mse: 1667.1349
    Epoch 12/12
    1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 863.3384 - mse: 863.3384
    32/32 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - loss: 621.0710 - mse: 621.0710 
    (1000, 2)
    1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step
    [[ 76.60049687 -70.62534063]]
    [-5409.93618401]
    [[-5450.9204]]
    

    And it's worth noticing, the model is still making good strides in improving the error with each epoch (epoch 12 halved the error over the previous epoch), so the approximation should improve more with more epochs as the model finishes converging.