I am confused about "newton-cg" and "newton-cholesky" explanations in different sources. According to the sklearn documentation
the “newton-cholesky” solver is an exact Newton solver that calculates the Hessian matrix and solves the resulting linear system.
While after reading a wonderful answer about LogisticRegression solvers, I thought that this is exactly what "newton-cg" does.
Source about the comparison of different methods provided by sklearn documentation also did not really clarify things.
So what is the difference between them?
When in doubt, you can look at the source code of scikit-learn: https://github.com/scikit-learn/scikit-learn.
Both solvers, "newton-cg“ and „newton-cholesky“, use the hessian (hence the naming after Newton) and solve the normal equations of the GLM optimization problem. The difference is as follows:
Newton cholesky explicitly constructs the hessian matrix and solves the linear equation exactly (within floating point arithmetics) via a Cholesky or LDL decomposition. On top, it performs line search for better convergence guarantees.
Newton congugate gradient never constructs the hessian matrix. It only uses matrix-vector products of it, e.g. hessian @ x
. It then solves the linear equation iteratively by conjugate gradient steps until the solution is accurate enough.