This documentation article explains how to apply a multi layer perceptron for a classification task. I have an input feature vector of length 80 and would like to train a multi layer perceptron to classify the input vectors into two categories -- 0 and 1. My output layer contains 1 node.
Problem 1) I expect the classifier to have a binary output, but the model outputs real-valued numbers (between 0
and 1
). Why does this happen and how to transform it to binary classes? It is not mentioned in the tutorial link how to obtain the binary valued labels, i.e. what function to use in the last layer.
To clarify, once the model output is calculated (below are the first 4 output values),
y =
0.1042
0.9961
0.9956
0.0049
I can use a simple threshold function like bin_target = y>=0.5
where all numbers greater than or equal to 0.5 are labelled as one and the rest as zero. However, the manual choice of a threshold appears arbitrary to me.
Problem 2) Calculation of MSE: Should the mean squared error or simple the error to report the performance be calculated between the known binary valued targets Y
and the real-valued model's output y
or perfClassify = y_bin - Y
?
The code below is my attempt to classify the data inputs
.
% Create a Pattern Recognition Network
hiddenLayerSize = 10;
net = init(net);
net.performFcn = 'crossentropy';
net = patternnet(hiddenLayerSize);
% Train the Network
[net,tr] = train(net,inputs,Y); % Y=targets
% Test the Network
y = net(inputs);
bin_target = y>=0.5;
error1 = bin_target-Y';
% OR
error2 = y - Y';
Your Problem 1 happens because the default output transfer function is 'softmax'
, which is a continuous function (technically, a probability distribution). Such output includes "confidence" information and not just the output class. My opinion is that 0.5
is the correct threshold for a binary classification problem, because to my understanding the value that gets output means "probability that this inputs corresponds to class true
".
>> net = patternnet;
>> disp(net.layers{net.numLayers}.transferFcn);
softmax
Unfortunately, I cannot comment whether softmax
is suitable for your problem, but if you want to change it, you can find a list of options using help nntransfer
:
>> help nntransfer
Neural Network Transfer Functions.
compet - Competitive transfer function.
elliotsig - Elliot sigmoid transfer function.
hardlim - Positive hard limit transfer function.
hardlims - Symmetric hard limit transfer function.
logsig - Logarithmic sigmoid transfer function.
netinv - Inverse transfer function.
poslin - Positive linear transfer function.
purelin - Linear transfer function.
radbas - Radial basis transfer function.
radbasn - Radial basis normalized transfer function.
satlin - Positive saturating linear transfer function.
satlins - Symmetric saturating linear transfer function.
softmax - Soft max transfer function.
tansig - Symmetric sigmoid transfer function.
tribas - Triangular basis transfer function.
Main nnet function list.
Perhaps what you're looking for is hardlim
. To change the transfer function simply assign a valid value to the transferFcn
field of the last layer (e.g. net.layers{net.numLayers}.transferFcn = 'hardlim';
).
As for Problem 2, as explained in this answer, it is beneficial to use the continuous scores.