As a starting point i am trying to create a neural network that predicts simple multiplication, with the goal to change the formula at the later date. My initial thought was that it would be trivial to do, and the code itself is quite simple, but the model is not trainable at all. It learns nearly nothing, and its predictions are not even beginning to approach anything reasonable.
The code is
import tensorflow as tf
import numpy as np
import random
import os.path
def compileModel():
model = tf.keras.Sequential()
model.add( tf.keras.layers.InputLayer(shape=(2,)) )
model.add( tf.keras.layers.Dense(1024) )
model.add( tf.keras.layers.Dense(1024) )
model.add( tf.keras.layers.Dense(units=1) )
model.compile(loss="mean_squared_error", optimizer="adam", metrics=["mse"])
return model
def generateData(num):
vals = np.zeros((num, 2))
res = np.zeros(num)
for i in range(num):
a = random.uniform(-100,100)
b = random.uniform(-100,100)
c = a*b
vals[i][0] = a
vals[i][1] = b
res[i] = c
return (vals, res)
modelFilename = 'saved_model.keras'
runTraining = False
if not runTraining and not os.path.isfile(modelFilename):
runTraining = True
if runTraining:
trainData, trainRes = generateData(100000)
model = compileModel()
model.fit(x=trainData, y=trainRes, epochs=12, batch_size=100)
model.save(filepath=modelFilename)
else:
model = tf.keras.models.load_model(modelFilename)
testData, testRes = generateData(1000)
model.evaluate(x=testData, y=testRes)
print(testData.shape)
res = model.predict(testData[0:1])
print(testData[0:1])
print(testRes[0:1])
print(res)
In here i am generating two numbers from -100 to 100 as the input and simply multiply them to get the correct answer. Everything runs, and it appears to go through epochs, but then nothing useful is predicted.
My guess would be that i have made a mistake in setting up the model itself. Here i am using two dense layers with 1024 connection points. I have tried playing with the number of layers and with the number of connections, but all that does is increase or decrease the time it takes for the model to train.
I think part of what might really be hurting you here is the wide range inputs, these models seem to perform best when values are in the neighborhood of 0 to 1 in my experience. Adding a normalization to the input and denormalization to the output may help quite a bit.
Another factor that is likely hurting the performance is the activation function on your dense layers is linear, which isn't great. I reran your same code with the activation function set to 'relu' for your dense layers and got approx 1% error on the output.
Here is the code I tested with (which has only minimal alterations):
import tensorflow as tf
import numpy as np
import random
import os.path
def compileModel():
model = tf.keras.Sequential()
model.add(tf.keras.layers.InputLayer(shape=(2,)))
model.add(tf.keras.layers.Dense(1024, activation='relu'))
model.add(tf.keras.layers.Dense(1024, activation='relu'))
model.add(tf.keras.layers.Dense(units=1))
model.compile(loss="mean_squared_error", optimizer="adam", metrics=["mse"])
return model
def generateData(num):
vals = np.zeros((num, 2))
res = np.zeros(num)
for i in range(num):
a = random.uniform(-100, 100)
b = random.uniform(-100, 100)
c = a * b
vals[i][0] = a
vals[i][1] = b
res[i] = c
return (vals, res)
modelFilename = 'saved_model.keras'
runTraining = False
if not runTraining and not os.path.isfile(modelFilename):
runTraining = True
if runTraining:
trainData, trainRes = generateData(100000)
model = compileModel()
model.fit(x=trainData, y=trainRes, epochs=12, batch_size=100)
model.save(filepath=modelFilename)
else:
model = tf.keras.models.load_model(modelFilename)
testData, testRes = generateData(1000)
model.evaluate(x=testData, y=testRes)
print(testData.shape)
res = model.predict(testData[0:1])
print(testData[0:1])
print(testRes[0:1])
print(res)
And here is the output it generated:
Epoch 1/12
1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - loss: 1773706.0000 - mse: 1773706.1250
Epoch 2/12
1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 409373.8438 - mse: 409373.8438
Epoch 3/12
1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 115835.0000 - mse: 115835.0000
Epoch 4/12
1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 13953.7402 - mse: 13953.7402
Epoch 5/12
1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 4830.2773 - mse: 4830.2773
Epoch 6/12
1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 2523.4836 - mse: 2523.4836
Epoch 7/12
1000/1000 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - loss: 1607.8860 - mse: 1607.8860
Epoch 8/12
1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 1378.7347 - mse: 1378.7347
Epoch 9/12
1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 1488.6558 - mse: 1488.6558
Epoch 10/12
1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 1009.2731 - mse: 1009.2731
Epoch 11/12
1000/1000 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - loss: 1667.1349 - mse: 1667.1349
Epoch 12/12
1000/1000 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - loss: 863.3384 - mse: 863.3384
32/32 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - loss: 621.0710 - mse: 621.0710
(1000, 2)
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step
[[ 76.60049687 -70.62534063]]
[-5409.93618401]
[[-5450.9204]]
And it's worth noticing, the model is still making good strides in improving the error with each epoch (epoch 12 halved the error over the previous epoch), so the approximation should improve more with more epochs as the model finishes converging.