matlabmachine-learninglogistic-regressiongradient-descent

Stochastic Gradient Descent for Logistic Regression always returns a cost of Inf and weight vector never gets any closer


I am trying to implement a logistic regression solver in MATLAB and i am finding the weights by stochastic gradient descent. I am running into a problem where my data seems to produce an infinite cost, and no matter what happens it never goes down...

Here is my gradient descent function:

function weightVector = logisticWeightsByGradientDescentStochastic(trueClass,features)
    %% This function attemps to converge on the best set of weights for a logistic regression order 1
    %% Input:
    % trueClass - the training data's vector of true class values
    % features
    %% Output:
    % weightVector - vector of size n+1 (n is number of features)
    % corresponding to convergent weights
    
    %% Get Data Size
    dataSize = size(features);
    
    %% Initial pick for weightVector
    weightVector = zeros(dataSize(2)+1, 1) %create a zero vector equal to size of number of features plus 1
    
    %% Choose learning Rate
    learningRate = 0.0001;
    
    %% Initial Cost
    cost = logisticCost(weightVector, features, trueClass)
    
    
    %% Stochastic Gradient Descent
    costThresh = 0.05 %define cost threshold
    
    iterCount = 0;
    while(cost > costThresh)
        for m=1:dataSize(1) %for all samples
            
            %% test Statement
            curFeatures = transpose([1.0 features(m,:)])
            
            %% calculate Sigmoid predicted 
            predictedClass = evaluateSigmoid(weightVector , [1.0 features(m,:)] )

            %% test Statement
            truth = trueClass(m)
                        
            %% Calculate gradient for all features
            gradient = learningRate .* (trueClass(m) - predictedClass) .* transpose([1.0 features(m,:)])

            %% Update weight vector by subtrating gradient from the old one weight vector
            weightVector = weightVector - gradient 
            
            %% Re-evaluate Cost with new weight vector
            cost = logisticCost(weightVector, features, trueClass)
            
            if(cost < costThresh)
                break
            end
            iterCount = iterCount + 1
            
        end %for m
    end %while cost > 0.05
    
    weightVector
    iterCount
end

And here is my cost function:

function cost = logisticCost(weightVector, features, trueClass)
    %% Calculates the total cost of applying weightVector to all samples
    %% for a linear regression model according to
    %% J(theta) = -(1/m) sum[ trueClass(log(predictedClass) + (1-trueClass)log(predictedClass)]
    %% Input:
    % weightVector - vector of n+1 weights where n is number of features
    % plus 1
    % features - matrix of features
    % trueClass - the training data's true class
    %% Output:
    % cost - the total cost
   
    dataSize = size(features); %get size of data
    
    errorSum = 0.0; %stores sum of errors
    for m = 1:dataSize(1) %for each row
        predictedClass = evaluateSigmoid(weightVector, [1.0 features(m,:)]); %evaluate the sigmoid to predict a class for sample m
        if trueClass(m) == 1
            errorSum = errorSum + log(predictedClass);
        else
            errorSum = errorSum + log(1 - predictedClass);
        end
    end
        
    cost = errorSum / (-1 .* dataSize(1)); %multiply by -(1/m) to get cost
end

Both these seem perfectly fine, i cant imagine why my cost function would ALWAYS return infinite.

Here is my training data where the first column is the class (Either 1 or 0) and the next seven columns are the features i am trying to regress on.


Solution

  • Your gradient has the wrong sign:

    gradient = learningRate .* (trueClass(m) - predictedClass) .* transpose([1.0 features(m,:)]) 
    

    It should be:

    gradient = learningRate .* (predictedClass - trueClass(m)) .* transpose([1.0 features(m,:)])
    

    See Andrew Ng's note for details: http://cs229.stanford.edu/notes/cs229-notes1.pdf

    The gradient with respect to the j-th parameter is obtained as below: (where h(x) is the logistic function; y is the true label; x is the feature vector.) enter image description here

    Otherwise, when you take the negative of gradient you are doing gradient ascend. I believe that 's why you eventually get infinite cost since it's dead loop and you never get out of it.

    The update rule should still be:

    weightVector = weightVector - gradient