pythonscikit-learnlibsvmliblinear

Why can't LinearSVC do this simple classification?


I'm trying to do the following simple classification using the LinearSVC object in scikit-learn. I've tried using both version 0.10 and 0.14. Using the code:

from sklearn.svm import LinearSVC, SVC
from numpy import *

data = array([[ 1007.,  1076.],
              [ 1017.,  1009.],
              [ 2021.,  2029.],
              [ 2060.,  2085.]])
groups = array([1, 1, 2, 2])

svc = LinearSVC()
svc.fit(data, groups)
svc.predict(data)

I get the output:

array([2, 2, 2, 2])

However, if I replace the classifier with

svc = SVC(kernel='linear')

then I get the result

array([ 1.,  1.,  2.,  2.])

which is correct. Does anyone know why using LinearSVC would botch this simple problem?


Solution

  • The algorithm underlying LinearSVC is very sensitive to extreme values in its input:

    >>> svc = LinearSVC(verbose=1)
    >>> svc.fit(data, groups)
    [LibLinear]....................................................................................................
    optimization finished, #iter = 1000
    
    WARNING: reaching max number of iterations
    Using -s 2 may be faster (also see FAQ)
    
    Objective value = -0.001256
    nSV = 4
    LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
         intercept_scaling=1, loss='l2', multi_class='ovr', penalty='l2',
         random_state=None, tol=0.0001, verbose=1)
    

    (The warning refers to the LibLinear FAQ, since scikit-learn's LinearSVC is based on that library.)

    You should normalize before fitting:

    >>> from sklearn.preprocessing import scale
    >>> data = scale(data)
    >>> svc.fit(data, groups)
    [LibLinear]...
    optimization finished, #iter = 39
    Objective value = -0.240988
    nSV = 4
    LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
         intercept_scaling=1, loss='l2', multi_class='ovr', penalty='l2',
         random_state=None, tol=0.0001, verbose=1)
    >>> svc.predict(data)
    array([1, 1, 2, 2])