Why is "pcost" or "dcost" increasing in the minimization problem when using cvxpy?

I have an optimization problem which I modeled using the cvxpy library. However, when I run the solve method, I notice that the cost value increases after each iteration, even though the problem is defined as a minimization problem.

Let's consider the following example from the cvxpy documentation itself:

import cvxpy

x = cvxpy.Variable(2)
objective = cvxpy.Minimize(x[0] + cvxpy.norm(x, 1))
constraints = [x >= 2]
problem = cvxpy.Problem(objective, constraints)

problem.solve(solver=cvxpy.ECOS, verbose=True)
print("Optimal value:", problem.value)
print("Object value:", objective.value)

And here is the result:

ECOS 2.0.10 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS

It     pcost       dcost      gap   pres   dres    k/t    mu     step   sigma     IR    |   BT
 0  +6.667e-01  +7.067e-01  +6e+00  6e-01  1e-02  1e+00  9e-01    ---    ---    1  1  - |  -  - 
 1  +3.500e+00  +3.925e+00  +1e+00  3e-01  4e-03  8e-01  2e-01  0.9890  2e-01   1  1  1 |  0  0
 2  +5.716e+00  +5.825e+00  +2e-01  6e-02  8e-04  2e-01  4e-02  0.9091  8e-02   1  1  1 |  0  0
 3  +5.997e+00  +5.998e+00  +3e-03  7e-04  1e-05  2e-03  5e-04  0.9881  1e-04   1  1  1 |  0  0
 4  +6.000e+00  +6.000e+00  +3e-05  8e-06  1e-07  3e-05  5e-06  0.9890  1e-04   1  1  1 |  0  0
 5  +6.000e+00  +6.000e+00  +3e-07  9e-08  1e-09  3e-07  6e-08  0.9890  1e-04   1  0  0 |  0  0
 6  +6.000e+00  +6.000e+00  +4e-09  1e-09  1e-11  3e-09  6e-10  0.9890  1e-04   1  0  0 |  0  0

OPTIMAL (within feastol=9.9e-10, reltol=6.2e-10, abstol=3.7e-09).
Runtime: 0.000061 seconds.

Optimal value: 5.999999996660147
Object value: 5.999999996660147

Do I have a wrong understanding of the value of pcost?

Solution

Well, answer to that is quite simple: those iterations wouldn't work. The constraint is not yet fulfilled.

I think you understand that, anyway, cost can't be under 6. What we are trying to find here, is a pair (x,y) such as x+|x|+|y| is minimal. But with x≥2 and y≥2. When x≥2>0, |x| is x. And |y| is y. So, that is 2x+y. And obviously that increases with both x and y (as long as x is not negative, which it can't be). So clearly, with the constraint, there can't be better solution than (2,2), for which pcost is 6.

Any "solution" with a pcost lower than 6 can't be a solution.

Same goes for dcost (which is a bit harder to compute manually, since that the cost of the dual problem. With an extra variable, and a Lagrangian).

What is more interesting in your logs is pres, which quantify the "unfulfillment" of constraints (the inequality ones).

So what happens during iterations, is that it starts with a small, pcost, but with invalid solutions. And then try to find the cheaper increase to add to fulfill as fast as possible the constraints. And manage to do so: when pres in under the acceptable threshold for constraint error, pcost has reached only 6. Which, we known, is the best feasible.

If you want to see partial solution, well, that is not easy, because if you just write max_iter=3 (for example) it will either stop before 3 iterations, because everything is already within tolerance, or fail and show nothing if not.

So the trick is to play with tolerances

For example

p.solve(verbose=True, reltol=1, feastol=1)
p.solution.primal_vars
# [1.36, 1.24]
p.solve(verbose=True, reltol=1, feastol=0.1)
p.solution.primal_vars
# [1.888, 2.019]
p.solve(verbose=True, reltol=0.01, feastol=0.1)
p.solution.primal_vars
# [1.999, 2.000]

(I just tweaked the tolerance parameters to get 1, then 2 then 3 iterations. No rules for that. It is just trial and error)

Note that the pcost returned by the function is exactly the one you could compute yourself. For example after 2 iterations, it returns 5.79, which is indeed 1.888+|1.888|+|2.019|

The pcost that is displayed by verbose log is always a bit smaller (but following obviously the same path). I don't know exact what that is. But it probably have something to do with an intermediate step.

But it doesn't matter. The point is, the strategy is to try incrementally, step by step, to fulfill the inequality, while trying to keep pcost at minimum. At first, pcost is small, but inequalities aren't ok. And we sacrifice more and more pcost until we finally reach ok inequalities (and with a trajectory that ensure that when we do, we can be sure that it was the least pcost we could sacrifice)

It a bit more complicated, because I describe this as if it were primal vars that we were playing with iteratively, and pcost, when in reality it is Lagrangian that is minimized. But that is the said "trajectory": we known that when we reached minimum of lagrangian, then it is the best we can do.