[SOLVED] How do I format IRIS data set for input to SVM-Light library?

How do I format IRIS data set for input to SVM-Light library?

I'm trying to use the SVM-Light library for training and classification of the IRIS dataset. Here is the python wrapper that I'm using. I'm currently following the example on the page but I'm not sure how to format the IRIS data correctly for input. A sample row in the IRIS datset looks like 5.0,3.6,1.4,0.2,Iris-setosa.

Solution

I don't know your library, but i highly recommend using scikit-learn, a powerful general-purpose ML-lib. I suppose you got a good reason to use svmlight, otherwise, libsvm- or liblinear-based usage is much much easier with sklearn (fully automatic; not file-based and automatic multi-class and co.).

Here is some simple example. Keep in mind, that only binary-targets are supported imho, and if you need multi-class learning, you would use sklearn's multiclass tools.

Code to load and prepare Iris

from sklearn.datasets import load_iris
from sklearn.datasets import dump_svmlight_file

iris = load_iris()
X = iris.data
y = iris.target

""" only keep first two classes """
indices = y<=1
X = X[indices]
y = y[indices]

""" transform to +1 / -1 targets (0 -> -1) """
y[y==0] = -1

dump_svmlight_file(X, y, 'my_dataset', zero_based=False)  # 1-based!!!

svmlight call

./svm_learn my_dataset my_output -v3
Scanning examples...done
Reading examples into memory...100..OK. (100 examples read)
Setting default regularization parameter C=0.0199
Optimizing...............done. (16 iterations)
Optimization finished (0 misclassified, maxdiff=0.00057).
Runtime in cpu-seconds: 0.00
Number of SV: 32 (including 28 at upper bound)
L1 loss: loss=4.89469
Norm of weight vector: |w|=0.69732
Norm of longest example vector: |x|=9.13674
Estimated VCdim of classifier: VCdim<=31.50739
Computing XiAlpha-estimates...done
Runtime for XiAlpha-estimates in cpu-seconds: 0.00
XiAlpha-estimate of the error: error<=30.00% (rho=1.00,depth=0)
XiAlpha-estimate of the recall: recall=>70.00% (rho=1.00,depth=0)
XiAlpha-estimate of the precision: precision=>70.00% (rho=1.00,depth=0)
Number of kernel evaluations: 1291
Writing model file...done