pythonnumpymachine-learningperceptron

Perceptron algorithm does not converge correctly


Given the following labelled data:

feature_matrix = np.array \
    (
        [
            [   1,    1],
            [   2,    3],
            [   3,    4],
            [-0.5, -0.5],
            [  -1,   -2],
            [  -2,   -3],
            [  -3,   -4],
            [  -4,   -3]
        ]
    )

labels = np.array \
    (
        [
             1,
             1,
             1,
            -1,
            -1,
            -1,
            -1,
            -1
        ]
    )

The idea is to run the linear perceptron algorithm through the labelled data until convergence in order to find the resulting parameter θ and offset parameter θ0. For reference, here is the perceptron function I wrote in python:

def perceptron(feature_matrix, labels, convergence=True):

    # initialize theta and theta_0 to zero
    theta = np.zeros(len(feature_matrix[0])); theta_0 = 0

    while convergence:
        for i in range(len(feature_matrix)):

            exit = True

            if (labels[i] * np.dot(feature_matrix[i], theta) <= 0):

                theta   = theta   + labels[i] * feature_matrix[i]
                theta_0 = theta_0 + labels[i]

                exit = False

        if exit: break

    return (theta, theta_0)

I have used this code before, and can confirm that it works. The exit flag makes sure that the loop breaks if there are no mistakes, however, the resulting parameters apparently are not correct. So, the perceptron algorithm is initialised with the parameters to zero, i.e θ=[0,0] and θ0=0. The first datapoint to be explored is (1,1). Below are the values the function returns after it converges:

parameters = perceptron(feature_matrix, labels)

print('Theta_0:', parameters[-1])
print('Theta  :', parameters[0].tolist())

# Theta_0: 1
# Theta  : [1.0, 1.0]

These values seem off, since the decision boundary sits right on top of the datapoint (-0.5,-0.5) as can be seen in a desmos graph here. I also figured that if I change the order, say the first datapoint to be explored be (2,3), then:

parameters = perceptron(feature_matrix, labels)

print('Theta_0:', parameters[-1])
print('Theta  :', parameters[0].tolist())

# Theta_0: 1
# Theta  : [2.0, 3.0]

Regardless of the order of iteration, it seems like the perceptron algorithm takes the first values as resulting parameters, and does not make any mistakes afterwards as per this conditional clause (labels[i] * np.dot(feature_matrix[i], theta) <= 0).

Why is this is happening and what should be the correct approach?


Solution

  • Lets start from the begining. Your exit=True should be outside your loop over examples as you want to exit iff all samples are correctly labeled. Currently you will exit as soon as the last one is correctly labeled.

    Second issue is that while you are updating theta_0 you are not actually using it in your criterion. Luckily for this dataset you do not need a bias as an optimal classifier crosses the origin.

    Finally, (1, 1) is the correct solution because of forgeting about theta_0! (so if theta_0=0, then indeed theta=(1,1) works fine), since it means "classify as positive class if and only if sum of features is positive", and for your points

    feature_matrix = np.array \
        (
            [
                [   1,    1],  # positive sample ; positive sum ; CORRECT
                [   2,    3],  # positive sample ; positive sum ; CORRECT
                [   3,    4],  # positive sample ; positive sum ; CORRECT
                [-0.5, -0.5],  # negative sample ; negative sum ; CORRECT
                [  -1,   -2],  # negative sample ; negative sum ; CORRECT
                [  -2,   -3],  # negative sample ; negative sum ; CORRECT
                [  -3,   -4],  # negative sample ; negative sum ; CORRECT
                [  -4,   -3]   # negative sample ; negative sum ; CORRECT
            ]
        )
    

    After fixing theta_0 you will also end up with a different solution, theta=(1.5, 1.5), theta_0=0 which is also correct enter image description here In general there are infinitely many solutions that perceptron might end up in.

    Note, that the classical perceptron algorithm converges (not just numerically, but actually stops and succeeds with 100% accuracy) with no learning rates, etc. as long as your data is linearly separable, and this dataset clearly is.

    So the final code will look something like this:

    def perceptron(feature_matrix, labels, convergence=True):
    
        # initialize theta and theta_0 to zero
        theta = np.zeros(len(feature_matrix[0]));
        theta_0 = 0
    
        while convergence:
            # FIRST FIX, initialisation outside the loop
            exit = True
            for i in range(len(feature_matrix)):
                # SECOND FIX, theta_0 used in error criterion
                if (labels[i] * (np.dot(feature_matrix[i], theta) + theta_0) <= 0):
                    theta   = theta   + labels[i] * feature_matrix[i]
                    theta_0 = theta_0 + labels[i]
    
                    exit = False
    
            if exit: break
    
        return (theta, theta_0)