Suppose we are given a training dataset {yᵢ, xᵢ}
, for i = 1, ..., n
, where yᵢ
can either be -1
or 1
and xᵢ
can be e.g. a 2D or 3D point.
In general, when the input points are linearly separable, the SVM model can be defined as follows
min 1/2*||w||²
w,b
subject to the constraints (for i = 1, ..., n
)
yᵢ*(w*xᵢ - b) >= 1
This is often called the hard-margin SVM model, which is thus a constrained minimization problem, where the unknowns are w
and b
. We can also omit 1/2
in the function to be minimized, given it's just a constant.
Now, the documentation about Matlab's quadprog
states
x = quadprog(H, f, A, b)
minimizes1/2*x'*H*x + f'*x
subject to the restrictionsA*x ≤ b
.A
is a matrix of doubles, andb
is a vector of doubles.
We can implement the hard-margin SVM model using quadprog
function, to get the weight vector w
, as follows
H
becomes an identity matrix.f'
becomes a zeros matrix.A
is the left-hand side of the constraintsb
is equal to -1
because the original constraint had >= 1
, it becomes <= -1
when we multiply with -1
on both sides.Now, I am trying to implement a soft-margin SVM model. The minimization equation here is
min (1/2)*||w||² + C*(∑ ζᵢ)
w,b
subject to the constraints (for i = 1, ..., n
)
yᵢ*(w*xᵢ - b) >= 1 - ζᵢ
such that ζᵢ >= 0
, where ∑
is the summation symbol, ζᵢ = max(0, 1 - yᵢ*(w*xᵢ - b))
and C
is a hyper-parameter.
How can this optimization problem be solved using the Matlab's quadprog
function? It's not clear to me how the equation should be mapped to the parameters of the quadprog
function.
The "primal" form of the soft-margin SVM model (i.e. the definition above) can be converted to a "dual" form. I did that, and I am able to get the Lagrange variable values (in the dual form). However, I would like to know if I can use quadprog
to solve directly the primal form without needing to convert it to the dual form.
I don't see how it can be a problem. Let z
be our vector of (2n + 1)
variables:
z = (w, eps, b)
Then, H
becomes diagonal matrix with first n
values on the diagonal equal to 1
and the last n + 1
set to zero:
H = diag([ones(1, n), zeros(1, n + 1)])
Vector f
can be expressed as:
f = [zeros(1, n), C * ones(1, n), 0]'
First set of constrains becomes:
Aineq = [A1, eye(n), zeros(n, 1)]
bineq = ones(n, 1)
where A1
is a the same matrix as in primal form.
Second set of constraints becomes lower bounds:
lb = (inf(n, 1), zeros(n, 1), inf(n, 1))
Then you can call MATLAB:
z = quadprog(H, f, Aineq, bineq, [], [], lb);
P.S. I can be mistaken in some small details, but the general idea is right.