Is it possible to use pre-calculated factorization to accelerate backslash\mldivide with sparse matrix

I perform many iterations of solving a linear system of equations: Mx=b with large and sparse M. M doesn't change between iterations but b does. I've tried several methods and so far found the backslash\mldivide to be the most efficient and accurate.

The following code is very similar to what I'm doing:

for ii=1:num_iter
  x = M\x;
  x = x+dx;
end

Now I want to accelerate the computation even more by utilizing the fact that M is fixed.

Setting the flag spparms('spumoni',2) allows detailed information of the solver algorithm.

I ran the following code:

spparms('spumoni',2);
x = M\B;

The output (monitoring):

sp\: bandwidth = 2452+1+2452.
sp\: is A diagonal? no.
sp\: is band density (0.01) > bandden (0.50) to try banded solver? no.
sp\: is A triangular? no.
sp\: is A morally triangular? no.
sp\: is A a candidate for Cholesky (symmetric, real positive diagonal)? no.
sp\: use Unsymmetric MultiFrontal PACKage with Control parameters:
UMFPACK V5.4.0 (May 20, 2009), Control:
    Matrix entry defined as: double
    Int (generic integer) defined as: UF_long

    0: print level: 2
    1: dense row parameter:    0.2
        "dense" rows have    > max (16, (0.2)*16*sqrt(n_col) entries)
    2: dense column parameter: 0.2
        "dense" columns have > max (16, (0.2)*16*sqrt(n_row) entries)
    3: pivot tolerance: 0.1
    4: block size for dense matrix kernels: 32
    5: strategy: 0 (auto)
    6: initial allocation ratio: 0.7
    7: max iterative refinement steps: 2
    12: 2-by-2 pivot tolerance: 0.01
    13: Q fixed during numerical factorization: 0 (auto)
    14: AMD dense row/col parameter:    10
       "dense" rows/columns have > max (16, (10)*sqrt(n)) entries
        Only used if the AMD ordering is used.
    15: diagonal pivot tolerance: 0.001
        Only used if diagonal pivoting is attempted.
    16: scaling: 1 (divide each row by sum of abs. values in each row)
    17: frontal matrix allocation ratio: 0.5
    18: drop tolerance: 0
    19: AMD and COLAMD aggressive absorption: 1 (yes)

    The following options can only be changed at compile-time:
    8: BLAS library used:  Fortran BLAS.  size of BLAS integer: 8
    9: compiled for MATLAB
    10: CPU timer is ANSI C clock (may wrap around).
    11: compiled for normal operation (debugging disabled)
    computer/operating system: Microsoft Windows
    size of int: 4 UF_long: 8 Int: 8 pointer: 8 double: 8 Entry: 8 (in bytes)

sp\: is UMFPACK's symbolic LU factorization (with automatic reordering) successful? yes.
sp\: is UMFPACK's numeric LU factorization successful? yes.
sp\: is UMFPACK's triangular solve successful? yes.
sp\: UMFPACK Statistics:
UMFPACK V5.4.0 (May 20, 2009), Info:
    matrix entry defined as:          double
    Int (generic integer) defined as: UF_long
    BLAS library used: Fortran BLAS.  size of BLAS integer: 8
    MATLAB:                           yes.
    CPU timer:                        ANSI clock ( ) routine.
    number of rows in matrix A:       3468
    number of columns in matrix A:    3468
    entries in matrix A:              60252
    memory usage reported in:         16-byte Units
    size of int:                      4 bytes
    size of UF_long:                  8 bytes
    size of pointer:                  8 bytes
    size of numerical entry:          8 bytes

    strategy used:                    symmetric
    ordering used:                    amd on A+A'
    modify Q during factorization:    no
    prefer diagonal pivoting:         yes
    pivots with zero Markowitz cost:               1284
    submatrix S after removing zero-cost pivots:
        number of "dense" rows:                    0
        number of "dense" columns:                 0
        number of empty rows:                      0
        number of empty columns                    0
        submatrix S square and diagonal preserved
    pattern of square submatrix S:
        number rows and columns                    2184
        symmetry of nonzero pattern:               0.904903
        nz in S+S' (excl. diagonal):               62184
        nz on diagonal of matrix S:                2184
        fraction of nz on diagonal:                1.000000
    AMD statistics, for strict diagonal pivoting:
        est. flops for LU factorization:           2.76434e+007
        est. nz in L+U (incl. diagonal):           306216
        est. largest front (# entries):            31329
        est. max nz in any column of L:            177
        number of "dense" rows/columns in S+S':    0
    symbolic factorization defragmentations:       0
    symbolic memory usage (Units):                 174698
    symbolic memory usage (MBytes):                2.7
    Symbolic size (Units):                         9196
    Symbolic size (MBytes):                        0
    symbolic factorization CPU time (sec):         0.00
    symbolic factorization wallclock time(sec):    0.00

    matrix scaled: yes (divided each row by sum of abs values in each row)
    minimum sum (abs (rows of A)):              1.00000e+000
    maximum sum (abs (rows of A)):              9.75375e+003

    symbolic/numeric factorization:      upper bound               actual      %
    variable-sized part of Numeric object:
        initial size (Units)                  149803               146332    98%
        peak size (Units)                    1037500               202715    20%
        final size (Units)                    787803               154127    20%
    Numeric final size (Units)                806913               171503    21%
    Numeric final size (MBytes)                 12.3                  2.6    21%
    peak memory usage (Units)                1083860               249075    23%
    peak memory usage (MBytes)                  16.5                  3.8    23%
    numeric factorization flops         5.22115e+008         2.59546e+007     5%
    nz in L (incl diagonal)                   593172               145107    24%
    nz in U (incl diagonal)                   835128               154044    18%
    nz in L+U (incl diagonal)                1424832               295683    21%
    largest front (# entries)                 348768                30798     9%
    largest # rows in front                      519                  175    34%
    largest # columns in front                   672                  177    26%

    initial allocation ratio used:                 0.309
    # of forced updates due to frontal growth:     1
    number of off-diagonal pivots:                 0
    nz in L (incl diagonal), if none dropped       145107
    nz in U (incl diagonal), if none dropped       154044
    number of small entries dropped                0
    nonzeros on diagonal of U:                     3468
    min abs. value on diagonal of U:               4.80e-002
    max abs. value on diagonal of U:               1.00e+000
    estimate of reciprocal of condition number:    4.80e-002
    indices in compressed pattern:                 13651
    numerical values stored in Numeric object:     295806
    numeric factorization defragmentations:        0
    numeric factorization reallocations:           0
    costly numeric factorization reallocations:    0
    numeric factorization CPU time (sec):          0.05
    numeric factorization wallclock time (sec):    0.00
    numeric factorization mflops (CPU time):       552.22

    solve flops:                                   1.78396e+006
    iterative refinement steps taken:              1
    iterative refinement steps attempted:          1
    sparse backward error omega1:                  1.80e-016
    sparse backward error omega2:                  0.00e+000
    solve CPU time (sec):                          0.00
    solve wall clock time (sec):                   0.00

    total symbolic + numeric + solve flops:        2.77385e+007

Observe the lines:

numeric factorization flops         5.22115e+008         2.59546e+007     5%
solve flops:                                   1.78396e+006
total symbolic + numeric + solve flops:        2.77385e+007

It indicates that the factorization of M took 2.59546e+007/2.77385e+007 = 93.6% of the total time required to solve the equations.

I would like to calculate the factorization in advance outside of my iterations and then run only the last stage which takes about 6.5% CPU time.

I know how to calculate the factorization ([L,U,P,Q,R] = lu(M);) but I don't know how to utilize its output as input to a solver.

I would like to run something in the spirit of:

[L,U,P,Q,R] = lu(M);
for ii=1:num_iter
  dx = solve_pre_factored(M,P,Q,R,x);
  x = x+dx;
end

Is there a way to do that in Matlab?

Solution

You have to ask yourself what all these matrices from the LU factorization do.

As the documentation states :

[L,U,P,Q,R] = lu(A) returns unit lower triangular matrix L, upper triangular matrix U, permutation matrices P and Q, and a diagonal scaling matrix R so that P*(R\A)Q = LU for sparse non-empty A. Typically, but not always, the row-scaling leads to a sparser and more stable factorization. The statement lu(A,'matrix') returns identical output values.

Thus in more mathematical terms we have PR^-1AQ = LU, thus A = RP^-1LUQ^-1

Then x = M\x can be rewritten in the following steps :

y = R^-1x
z = P y
u = L^-1z
v = U^-1u
w = Q v
x = w

To invert U, L and R you can use \ which will recognize they are triangular (and diagonal for R) matrices - as monitoring should confirm, and use the appropriate trivial solvers for them.

Thus in a denser and matlab-written way : x = Q*(U\(L\(P*(R\x))));

Doing this will be exactly what happens inside the solver \, with only a single factorization, as you asked.

However, as stated in the comments, it will become faster for big numbers of inversions to compute N = M^-1 once, and then only do a simple matrix-vector multiplication, which is much simpler than the process explained above. The initial computation, inv(M), is longer and has some limitations, so this trade-off also depends on properties if your matrix.