Why does tensorflow loss go to infinity with larger training set?

I've created a very simple tensorflow model and it works if I have one set of training data. However, if I add just one more training example then the loss goes to infinity and the model doesn't work. This is even though the model for the two examples is identical. The only difference is the addition of one more training example.

I would like to make a large training set but this doesn't seem possible if the loss will diverge when the training set is too large. The prediction is also totally wrong in the case with the extra training example. In the case with one fewer example the prediction is correct. In the code below, model has 20 training examples and the loss goes to infinity. Model2 has 19 training examples and the loss function goes to (close to) zero.

<pre>    
    import tensorflow as tf
    import numpy as np
    from tensorflow import keras

    print(tf.__version__)``

    def hw_function(x):
        y = (2. * x) - 1.
        return y

    # Build a simple Sequential model
    model = tf.keras.Sequential([
        tf.keras.Input(shape=(1,)),
        tf.keras.layers.Dense(units=1)])

    # Compile the model
    model.compile(optimizer='sgd', loss='mean_squared_error')

    # Declare model inputs and outputs for training
    xs=[x for x in range(-1, 19, 1)]
    ys=[x for x in range(-3, 36, 2)]

    xs=np.array(xs, dtype=float)
    ys=np.array(ys, dtype=float)

    # Train the model
    model.fit(xs, ys, verbose=1, epochs=500)

    # Make a prediction
    p = np.array([100.0, 900.0], dtype=float)
    print(model.predict(p))


    # Build exactly the same model but have one more training example
    model2 = tf.keras.Sequential([
        tf.keras.Input(shape=(1,)),
        tf.keras.layers.Dense(units=1)])
    model2.compile(optimizer='sgd', loss='mean_squared_error')
    xs2=[x for x in range(-1, 18, 1)]
    ys2=[x for x in range(-3, 34, 2)]

    xs2=np.array(xs2, dtype=float)
    ys2=np.array(ys2, dtype=float)

    # Train the model
    model2.fit(xs2, ys2, verbose=1, epochs=500)
    p = np.array([100.0, 900.0], dtype=float)
    print(model2.predict(p))
<code>

Solution

This has nothing to do with the number of examples, but rather the numerical sizes of the data. For example, you can extend the dataset "below" and it still works:

xs=[x for x in range(-2, 18, 1)]
ys=[x for x in range(-5, 34, 2)]

You likely found the exact numerical threshold where the optimization starts to become unstable. You can fix it by decreasing the learning rate. This fails with the "larger" dataset:

opt = tf.keras.optimizers.SGD(learning_rate=0.01)
model.compile(optimizer=opt, loss='mean_squared_error')

This works fine:

opt = tf.keras.optimizers.SGD(learning_rate=0.001)
model.compile(optimizer=opt, loss='mean_squared_error')

As for predictions being "totally wrong", I can't reproduce that. They are slightly off, but that's because the parameters don't match exactly. E.g. in one example I get 1.996 * x - 0.9469 instead of 2 * x - 1, which is fairly close, but the differences will be larger for large inputs like 100 or 900.