I found sklearn.svm.LinearSVC
and sklearn.svm.SVC(kernel='linear')
and they seem very similar to me, but I get very different results on Reuters.
sklearn.svm.LinearSVC: 81.05% in 28.87s train / 9.71s test
sklearn.svm.SVC : 33.55% in 6536.53s train / 2418.62s test
Both have a linear kernel. The tolerance of the LinearSVC is higher than the one of SVC:
LinearSVC(C=1.0, tol=0.0001, max_iter=1000, penalty='l2', loss='squared_hinge', dual=True, multi_class='ovr', fit_intercept=True, intercept_scaling=1)
SVC (C=1.0, tol=0.001, max_iter=-1, shrinking=True, probability=False, cache_size=200, decision_function_shape=None)
How do both functions differ otherwise? Even if I set kernel='linear
, tol=0.0001
, max_iter=1000 and
decision_function_shape='ovr'the
SVCtakes much longer than
LinearSVC`. Why?
I use sklearn 0.18
and both are wrapped in the OneVsRestClassifier
. I'm not sure if this makes the same as multi_class='ovr'
/ decision_function_shape='ovr'
.
Truly, LinearSVC
and SVC(kernel='linear')
yield different results, i. e. metrics score and decision boundaries, because they use different approaches. The toy example below proves it:
from sklearn.datasets import load_iris
from sklearn.svm import LinearSVC, SVC
X, y = load_iris(return_X_y=True)
clf_1 = LinearSVC().fit(X, y) # possible to state loss='hinge'
clf_2 = SVC(kernel='linear').fit(X, y)
score_1 = clf_1.score(X, y)
score_2 = clf_2.score(X, y)
print('LinearSVC score %s' % score_1)
print('SVC score %s' % score_2)
--------------------------
>>> 0.96666666666666667
>>> 0.98666666666666669
The key principles of that difference are the following:
LinearSVC
minimizes the squared hinge loss while SVC
minimizes the regular hinge loss. It is possible to manually define a 'hinge' string for loss
parameter in LinearSVC
.LinearSVC
uses the One-vs-All (also known as One-vs-Rest) multiclass reduction while SVC
uses the One-vs-One multiclass reduction. It is also noted here. Also, for multi-class classification problem SVC
fits N * (N - 1) / 2
models where N
is the amount of classes. LinearSVC
, by contrast, simply fits N
models. If the classification problem is binary, then only one model is fit in both scenarios. multi_class
and decision_function_shape
parameters have nothing in common. The second one is an aggregator that transforms the results of the decision function in a convenient shape of (n_features, n_samples)
. multi_class
is an algorithmic approach to establish a solution.LinearSVC
are liblinear, that do in fact penalize the intercept. SVC
uses libsvm estimators that do not. liblinear estimators are optimized for a linear (special) case and thus converge faster on big amounts of data than libsvm. That is why LinearSVC
takes less time to solve the problem.In fact, LinearSVC
is not actually linear after the intercept scaling as it was stated in the comments section.