pythontensorflowmachine-learningconvergencelearning-rate

Why do I need a very high learning rate for this model to converge?


I have a simple model in tensorflow which is being trained on the first 1000 images in the MNIST datset. From my previous experience the learning rates which I used were of the order of around 0.001, however for my model to converge the learning rate needs to be far heigher, at least larger than 1. The model is shown below.

def gen_model():
    return tf.keras.models.Sequential([
        tf.keras.Input(shape=(28,28,)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation='sigmoid'),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    
model = gen_model()
model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=5), loss='mean_squared_error')
model.summary()
model.fit(x_train, y_train, batch_size=1000, epochs=10000)

Is it expected for models of this form to require an extremely high learning rate, or is there something I have missed? When I use a learning rate of around 0.001 the loss changes incredibly slowly.

The dataset was created with the following code:

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype("float32") / 255.0

x_train = x_train.reshape(60000,28,28)[:1000];
y_train = y_train[:1000];

y_train = tf.one_hot(y_train, 10)

Solution

  • Generally speaking, models that require learning rates larger than 1 raise a red flag for me. It seems like your model is a vanilla multilayer perceptron, so there's nothing overly complicated about that, but there are a couple things about your setup that stand out: