I have an optimization problem which I modeled using the cvxpy
library. However, when I run the solve
method, I notice that the cost value increases after each iteration, even though the problem is defined as a minimization problem.
Let's consider the following example from the cvxpy documentation itself:
import cvxpy
x = cvxpy.Variable(2)
objective = cvxpy.Minimize(x[0] + cvxpy.norm(x, 1))
constraints = [x >= 2]
problem = cvxpy.Problem(objective, constraints)
problem.solve(solver=cvxpy.ECOS, verbose=True)
print("Optimal value:", problem.value)
print("Object value:", objective.value)
And here is the result:
ECOS 2.0.10 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS
It pcost dcost gap pres dres k/t mu step sigma IR | BT
0 +6.667e-01 +7.067e-01 +6e+00 6e-01 1e-02 1e+00 9e-01 --- --- 1 1 - | - -
1 +3.500e+00 +3.925e+00 +1e+00 3e-01 4e-03 8e-01 2e-01 0.9890 2e-01 1 1 1 | 0 0
2 +5.716e+00 +5.825e+00 +2e-01 6e-02 8e-04 2e-01 4e-02 0.9091 8e-02 1 1 1 | 0 0
3 +5.997e+00 +5.998e+00 +3e-03 7e-04 1e-05 2e-03 5e-04 0.9881 1e-04 1 1 1 | 0 0
4 +6.000e+00 +6.000e+00 +3e-05 8e-06 1e-07 3e-05 5e-06 0.9890 1e-04 1 1 1 | 0 0
5 +6.000e+00 +6.000e+00 +3e-07 9e-08 1e-09 3e-07 6e-08 0.9890 1e-04 1 0 0 | 0 0
6 +6.000e+00 +6.000e+00 +4e-09 1e-09 1e-11 3e-09 6e-10 0.9890 1e-04 1 0 0 | 0 0
OPTIMAL (within feastol=9.9e-10, reltol=6.2e-10, abstol=3.7e-09).
Runtime: 0.000061 seconds.
Optimal value: 5.999999996660147
Object value: 5.999999996660147
Do I have a wrong understanding of the value of pcost
?
Well, answer to that is quite simple: those iterations wouldn't work. The constraint is not yet fulfilled.
I think you understand that, anyway, cost can't be under 6. What we are trying to find here, is a pair (x,y) such as x+|x|+|y| is minimal. But with x≥2 and y≥2. When x≥2>0, |x| is x. And |y| is y. So, that is 2x+y. And obviously that increases with both x and y (as long as x is not negative, which it can't be). So clearly, with the constraint, there can't be better solution than (2,2), for which pcost is 6.
Any "solution" with a pcost lower than 6 can't be a solution.
Same goes for dcost (which is a bit harder to compute manually, since that the cost of the dual problem. With an extra variable, and a Lagrangian).
What is more interesting in your logs is pres, which quantify the "unfulfillment" of constraints (the inequality ones).
So what happens during iterations, is that it starts with a small, pcost, but with invalid solutions. And then try to find the cheaper increase to add to fulfill as fast as possible the constraints. And manage to do so: when pres
in under the acceptable threshold for constraint error, pcost has reached only 6. Which, we known, is the best feasible.
If you want to see partial solution, well, that is not easy, because if you just write max_iter=3
(for example) it will either stop before 3 iterations, because everything is already within tolerance, or fail and show nothing if not.
So the trick is to play with tolerances
For example
p.solve(verbose=True, reltol=1, feastol=1)
p.solution.primal_vars
# [1.36, 1.24]
p.solve(verbose=True, reltol=1, feastol=0.1)
p.solution.primal_vars
# [1.888, 2.019]
p.solve(verbose=True, reltol=0.01, feastol=0.1)
p.solution.primal_vars
# [1.999, 2.000]
(I just tweaked the tolerance parameters to get 1, then 2 then 3 iterations. No rules for that. It is just trial and error)
Note that the pcost returned by the function is exactly the one you could compute yourself. For example after 2 iterations, it returns 5.79, which is indeed 1.888+|1.888|+|2.019|
The pcost
that is displayed by verbose log is always a bit smaller (but following obviously the same path). I don't know exact what that is. But it probably have something to do with an intermediate step.
But it doesn't matter. The point is, the strategy is to try incrementally, step by step, to fulfill the inequality, while trying to keep pcost at minimum. At first, pcost is small, but inequalities aren't ok. And we sacrifice more and more pcost until we finally reach ok inequalities (and with a trajectory that ensure that when we do, we can be sure that it was the least pcost we could sacrifice)
It a bit more complicated, because I describe this as if it were primal vars that we were playing with iteratively, and pcost, when in reality it is Lagrangian that is minimized. But that is the said "trajectory": we known that when we reached minimum of lagrangian, then it is the best we can do.