pythonnumpymachine-learningscikit-learnsgd

Why the SGDRegressor function in sklearn can't converge to the correct optima?


I was practicing using SGDRegressor in sklearn but I meet some problems, and I have simplified it as the following code.

import numpy as np
from sklearn.linear_model import SGDRegressor

X = np.array([0,0.5,1]).reshape((3,1))
y = np.array([0,0.5,1]).reshape((3,1))

sgd = SGDRegressor()  
sgd.fit(X, y.ravel())

print("intercept=", sgd.intercept_)
print("coef=", sgd.coef_)

And this is the output:

intercept= [0.19835632]
coef= [0.18652387]

All the outputs are around intercept=0.19 and coef=0.18, but obviously the correct answer is intercept=0 and coef=1. Even in this simple example, the program can't get the correct solution of the parameters. I wonder where I've made a mistake.


Solution

  • With n=10000 data points (draw samples with replacement from your 3 original points) you get the following results with SGD

    n = 10000
    
    X = np.random.choice([0,0.5,1], n, replace=True)
    y = X
    
    X = X.reshape((n,1))
    
    sgd = SGDRegressor(verbose=1)  
    sgd.fit(X, y)
    
    # -- Epoch 1
    # Norm: 0.86, NNZs: 1, Bias: 0.076159, T: 10000, Avg. loss: 0.012120
    # Total training time: 0.04 seconds.
    # -- Epoch 2
    # Norm: 0.96, NNZs: 1, Bias: 0.024337, T: 20000, Avg. loss: 0.000586
    # Total training time: 0.04 seconds.
    # -- Epoch 3
    # Norm: 0.98, NNZs: 1, Bias: 0.008826, T: 30000, Avg. loss: 0.000065
    # Total training time: 0.04 seconds.
    # -- Epoch 4
    # Norm: 0.99, NNZs: 1, Bias: 0.003617, T: 40000, Avg. loss: 0.000010
    # Total training time: 0.04 seconds.
    # -- Epoch 5
    # Norm: 1.00, NNZs: 1, Bias: 0.001686, T: 50000, Avg. loss: 0.000002
    # Total training time: 0.05 seconds.
    # -- Epoch 6
    # Norm: 1.00, NNZs: 1, Bias: 0.000911, T: 60000, Avg. loss: 0.000000
    # Total training time: 0.05 seconds.
    # -- Epoch 7
    # Norm: 1.00, NNZs: 1, Bias: 0.000570, T: 70000, Avg. loss: 0.000000
    # Total training time: 0.05 seconds.
    # Convergence after 7 epochs took 0.05 seconds
    
    print("intercept=", sgd.intercept_)
    print("coef=", sgd.coef_)
    # intercept= [0.00057032]
    # coef= [0.99892893]
    
    plt.plot(X, y, 'r.')
    plt.plot(X, sgd.intercept_ + sgd.coef_*X, 'b-')
    

    enter image description here

    The following animation shows how SGD regressor starts converging to the correct optima as n goes up in the above code:

    enter image description here