mathematical-optimizationnonlinear-optimizationceres-solver

When to define multiple Residual blocks in Ceres?


I'm going through the Ceres Solver tutorial.

Powell's Function

Powell's function maps from R^4 -> R^4, so it seems intuitive to define one residual block that takes in a 4-element array x and fill in a 4-element array residual.

Instead, the example in the tutorial defines 4 different residual blocks that map R^2 -> R^1.

Of course if we are trying to minimize 1/2 || F(x) ||^2, then minimizing each element of F will implicitly yield the same solution as minimizing 1/2 || F(x) ||^2 directly (ie. my suggestion is to return a single residual vector F instead of F1...F4 separately). (I have verified this by using the below cost functor).

struct F {
    template<typename T>
    bool operator() (const T* const x, T* residual) const {

        residual[0] = x[0] + 10.0 * x[1];
        residual[1] = sqrt(5.0) * (x[2] - x[3]);
        residual[2] = (x[1] - 2.0*x[2]) * (x[1] - 2.0*x[2]);
        residual[3] = T(sqrt(10.0)) * (x[0]  - x[3]) * (x[0] - x[3]);

        return true;
    }
};
  1. What's the advantage of defining separate residual blocks (and implicitly parameter blocks) for each element the residual vector F?

  2. If residual F1 depends on parameters x1 and x2 and residual F2 depends on x3 and x4, will the cost of F with respect to x1 impact the value of x3?

Curve Fitting

The other example attempts to find parameters m and c to curve y=e^(mx + c).

It defines some ExponentialResidual which simply outputs T(y_) - exp(m[0] * T(x_) + c[0]) where (x_, y_) is a data point.

They then proceed to add one residual block for every observation

double m = 0.0;
double c = 0.0;

Problem problem;
for (int i = 0; i < kNumObservations; ++i) {
  CostFunction* cost_function =
       new AutoDiffCostFunction<ExponentialResidual, 1, 1, 1>(
           new ExponentialResidual(data[2 * i], data[2 * i + 1]));
  problem.AddResidualBlock(cost_function, NULL, &m, &c);
}
  1. Although I was too lazy to reproduce this example myself, I suspect that this could also be accomplished with just one residual block that maps R^2 -> R^1, where the 1D residual is just the sum of all T(y_) - exp(m[0] * T(x_) + c[0]) for all (x_, y_) ? Was it necessary to define a residual block for every observation?

Thanks for reading this long-is post!


Solution

  • The primary reason is that Ceres considers sparsity only at the level of parameter and residual blocks, not individual terms. For a truly dense problem, where every residual term depends upon every parameter, how the problem is represented has no significant impact on the runtime performance.

    However, Ceres is designed to handle very large sparse problems where every residual term depends on only a few parameters, but there are a lot of parameters and residuals. It is absolutely critical to exploit the sparsity structure of these problems to be able to solve them efficiently - i.e. avoid a lot of pointless calculations where all the terms are zero. Whilst the linear algebra is mathematically identical, the approach by which the linear system is represented and solved is very different when dealing with sparse matrices.

    A secondary reason, is that it makes modelling the problem easier. Loss functions are applied to the result of: r^T * r (where r is a residual block). If you have only a single residual block, the loss function is effectively scaling the total cost rather than down-weighting just the parts of the problem with large errors (outlier rejection). Ceres can also thread the evaluation of residual blocks, thus speeding up the evaluation of large problems.

    The last part of your question is different. In the curve fitting example, there are two parameters and kNumObservations residuals, the problem is heavily overdetermined. However, if you instead had a single residual that internally calculated the errors for kNumObservations points, but only returned their sum, then the problem as solved has two parameters but only one residual and is underdetermined. As a separate note, returning their sum would also be different because you are not returning the squared sum, thus errors of different signs in different residuals could effectively cancel each other out - none of which would be observable by the solver.