I am trying to implement a logistic regression solver in MATLAB and i am finding the weights by stochastic gradient descent. I am running into a problem where my data seems to produce an infinite cost, and no matter what happens it never goes down...
Here is my gradient descent function:
function weightVector = logisticWeightsByGradientDescentStochastic(trueClass,features)
%% This function attemps to converge on the best set of weights for a logistic regression order 1
%% Input:
% trueClass - the training data's vector of true class values
% features
%% Output:
% weightVector - vector of size n+1 (n is number of features)
% corresponding to convergent weights
%% Get Data Size
dataSize = size(features);
%% Initial pick for weightVector
weightVector = zeros(dataSize(2)+1, 1) %create a zero vector equal to size of number of features plus 1
%% Choose learning Rate
learningRate = 0.0001;
%% Initial Cost
cost = logisticCost(weightVector, features, trueClass)
%% Stochastic Gradient Descent
costThresh = 0.05 %define cost threshold
iterCount = 0;
while(cost > costThresh)
for m=1:dataSize(1) %for all samples
%% test Statement
curFeatures = transpose([1.0 features(m,:)])
%% calculate Sigmoid predicted
predictedClass = evaluateSigmoid(weightVector , [1.0 features(m,:)] )
%% test Statement
truth = trueClass(m)
%% Calculate gradient for all features
gradient = learningRate .* (trueClass(m) - predictedClass) .* transpose([1.0 features(m,:)])
%% Update weight vector by subtrating gradient from the old one weight vector
weightVector = weightVector - gradient
%% Re-evaluate Cost with new weight vector
cost = logisticCost(weightVector, features, trueClass)
if(cost < costThresh)
break
end
iterCount = iterCount + 1
end %for m
end %while cost > 0.05
weightVector
iterCount
end
And here is my cost function:
function cost = logisticCost(weightVector, features, trueClass)
%% Calculates the total cost of applying weightVector to all samples
%% for a linear regression model according to
%% J(theta) = -(1/m) sum[ trueClass(log(predictedClass) + (1-trueClass)log(predictedClass)]
%% Input:
% weightVector - vector of n+1 weights where n is number of features
% plus 1
% features - matrix of features
% trueClass - the training data's true class
%% Output:
% cost - the total cost
dataSize = size(features); %get size of data
errorSum = 0.0; %stores sum of errors
for m = 1:dataSize(1) %for each row
predictedClass = evaluateSigmoid(weightVector, [1.0 features(m,:)]); %evaluate the sigmoid to predict a class for sample m
if trueClass(m) == 1
errorSum = errorSum + log(predictedClass);
else
errorSum = errorSum + log(1 - predictedClass);
end
end
cost = errorSum / (-1 .* dataSize(1)); %multiply by -(1/m) to get cost
end
Both these seem perfectly fine, i cant imagine why my cost function would ALWAYS return infinite.
Here is my training data where the first column is the class (Either 1 or 0) and the next seven columns are the features i am trying to regress on.
Your gradient has the wrong sign:
gradient = learningRate .* (trueClass(m) - predictedClass) .* transpose([1.0 features(m,:)])
It should be:
gradient = learningRate .* (predictedClass - trueClass(m)) .* transpose([1.0 features(m,:)])
See Andrew Ng's note for details: http://cs229.stanford.edu/notes/cs229-notes1.pdf
The gradient with respect to the j-th parameter is obtained as below: (where h(x)
is the logistic function; y
is the true label; x
is the feature vector.)
Otherwise, when you take the negative of gradient you are doing gradient ascend. I believe that 's why you eventually get infinite cost since it's dead loop and you never get out of it.
The update rule should still be:
weightVector = weightVector - gradient