The following lines of code appear in levenberg-marquardt algorithm in the optimization package "Optim":
DtD = diagm(Float64[max(x, MIN_DIAGONAL) for x in sum(J.^2,1)])
delta_x = ( J'*J + sqrt(lambda)*DtD ) \ -J'*fcur
However, my questions has nothing to do with the algorithm or anything specific to the package. I guess it has more to do with linear algebra and factorization in base julia.
If I have a full matrix J, the following works:
In [3]: n = 5
J = rand(n,n)
fcur = rand(n)
DtD = diagm(Float64[max(x, 1e-6) for x in sum(J.^2,1)])
( J'*J + sqrt(100)*DtD ) \ -J'*fcur
Out [3]: 5-element Array{Float64,1}:
-0.0740316
-0.0643279
-0.0665033
-0.10568
-0.0599613
However, if J is sparse, I get an error:
In [4]: J = sparse(J)
DtD = diagm(Float64[max(x, 1e-6) for x in sum(J.^2,1)])
( J'*J + sqrt(100)*DtD ) \ -J'*fcur
Out [4]: ERROR: `A_ldiv_B!` has no method matching A_ldiv_B!(::Cholesky{Float64}, ::SparseMatrixCSC{Float64,Int64})
while loading In[4], in expression starting on line 3
in \ at linalg/generic.jl:233
So as far as I understand (with my limited knowledge of julia as a beginner), this error occurs because julia tries to compute ( J'*J + sqrt(100)*DtD ) \ -J'
first. My first question is how can I know what path julia is taking when implementing the above code? I am aware of @which
but I don't know how to use it to get to A_ldiv_B! as this should start with \(A,B)
and then somehow end up with A_ldiv_B! :
In [6]: a = ( J'*J + sqrt(100)*DtD ); b = -J'; @which a\b
Out [6]: \(A::Union(SubArray{T,2,A<:DenseArray{T,N},I<:(Union(Range{Int64},Int64)...,)},DenseArray{T,2}),B::Union(SubArray{T,2,A<:DenseArray{T,N},I<:(Union(Range{Int64},Int64)...,)},SubArray{T,1,A<:DenseArray{T,N},I<:(Union(Range{Int64},Int64)...,)},DenseArray{T,2},DenseArray{T,1})) at linalg/dense.jl:409
Also note that:
In [7]: typeof(a)
Out [7]: Array{Float64,2}
In [8]: typeof(b)
Out [8]: Array{Float64,2}
Which makes this even more confusing, as there is no Cholesky type in the above. My second question is: how does the Cholesky type appear? The error message says: A_ldiv_B!
has no method matching A_ldiv_B!(::Cholesky{Float64}, ::SparseMatrixCSC{Float64,Int64})
Another interesting point that I accidentally found was that if the sparse matrix is of size (2,2) the above error does not occur:
In [9]: n = 2
J = sparse(rand(n,n))
fcur = rand(n)
DtD = diagm(Float64[max(x, 1e-6) for x in sum(J.^2,1)])
( J'*J + sqrt(100)*DtD ) \ -J'*fcur
Out [9]: 2-element Array{Float64,1}:
-0.0397989
-0.052129
Finally, I could solve this problem by putting -J'*fcur
in parantheses, which seems to be the intention of the author anyway. But I am very confused. Any thoughts are greatly appreciated. Thanks!
In [12]: n = 5
J = sparse(rand(n,n))
fcur = rand(n)
DtD = diagm(Float64[max(x, 1e-6) for x in sum(J.^2,1)])
( J'*J + sqrt(100)*DtD ) \ (-J'*fcur)
Out [12]: 5-element Array{Float64,1}:
-0.0536266
-0.0692286
-0.0673166
-0.0616349
-0.0559813
It can be a bit tricky to figure out exactly which code path is taken when your are running into code that uses substitutions during parsing as is the case for '
. You could try
julia> ( J'*J + sqrt(100)*DtD ) \ -J'fcur
to see another substitution taking place.
I don't know if it really is an answer to your question, but I'll note three things about your example.
(( J'*J + sqrt(100)*DtD ) \ ctranspose(-J))*fcur
We haven't implemented sparse right hand side in solves because even when b
in Ax=b
is sparse then typically the result is not sparse, so the gain from exploiting the sparseness of the b
is modest.
The "right" way of solving the problem is to add the parentheses around (-Jfcur)
because then the solution is a sparse matrix-vector multiplication and a sparse matrix-vector solve instead of sparse matrix-matrix solve and a dense matrix-vector multiplication.