I'm using Matlab command fitcdiscr to implement an LDA with 379 features and 8 classes. I would like to get a global weight for each feature, to investigate their influence in the prediction. How can I obtain it from the pairwise (for each pair of classes) coefficients in the field Coeffs of the ClassificationDiscriminant object?
It looks like fitcdiscr
does not output the eigenvalues or the eigenvectors.
I'm not going to explain what are eigenvectors and eigenvalues here, since there is plenty of documentation on the web. But basically the produced eigenvectors will determine the axis that maximize the distance between each class.
I've writted a minimal (inspired by this excellent article) example that output both of them:
% We load the fisheriris dataset
load fisheriris
feature = meas; % 150x4 array
class = species; % 150x1 cell
% Extract unique class and the corresponding index for each feature.
[ucl,~,idc] = unique(class);
% Number of parameter and number of class
np = size(meas,2);
nc = length(ucl);
% Mean by class
MBC = splitapply(@mean,feature,idc);
% Compute the Within class Scatter Matrix WSM
WSM = zeros(np);
for ii = 1:nc
FM = feature(idc==ii,:)-MBC(ii,:);
WSM = WSM + FM.'*FM;
end
WSM
% Compute the Between class Scatter Matrix
BSM = zeros(np);
GPC = accumarray(idc,ones(size(classe)));
for ii = 1:nc
BSM = BSM + GPC(ii)*((MBC(ii,:)-mean(feature)).'*(MBC(ii,:)-mean(feature)));
end
BSM
% Now we compute the eigenvalues and the eigenvectors
[eig_vec,eig_val] = eig(inv(WSM)*BSM)
% Compute the new feature:
new_feature = feature*eig_vec
With:
eig_vec =
[-0.2087 -0.0065 0.7666 -0.4924 % -> feature 1
-0.3862 -0.5866 -0.0839 0.4417 % -> feature 2
0.5540 0.2526 -0.0291 0.2875 % -> feature 3
0.7074 -0.7695 -0.6359 -0.5699] % -> feature 4
% So the first new feature is a linear combination of
% -0.2087*feature1 + -0.3862*feature2 + 0.5540*feature3 + 0.7074*feature4
eig_val =
[ 32.1919 % eigen value of the new feature 1
0.2854 % eigen value of the new feature 2
0.0000 % eigen value of the new feature 3
-0.0000] % eigen value of the new feature 4
In this case we have 4 features, here is the histogram of this 4 features (1 class = 1 color):
We see that the feature 3 and 4 are pretty good if we want to distinguish the different class but not perfect.
Now, after the LDA, we have those new features:
And we see that almost all the information have been gathered in the first new feature (new feature 1). All the other feature are pretty useless, so keep only the new feature 1
and remove the other one. We now have a 1D dataset instead of a 4D dataset.