rmatrixmatrix-decompositionqr-decomposition

Shaken faith in `qr()`


I have relied on the qr() function a lot in dealing with rank-deficient situations, but have recently run into some examples where it doesn't work correctly. Consider the matrix badX below:

badX <-
structure(c(-1.641906809157e-10, 0, 0, 0, 0, -0.5, 0, 0, -1.10482935525559e-16, 
            0, -3.06266685765538e-17, 0, -4.83736007092039e-17, 0, -3.14414492582296e-18, 
            -3.06158275836099e-18), dim = c(4L, 4L), dimnames = list(c("(Intercept)", 
            "A2", "A3", "B2"), NULL))

We cannot invert this matrix using the solve():

solve(badX)
## Error in solve.default(badX): system is computationally singular: reciprocal condition number = 5.55308e-18

Yet qr() and its associated routines thinks this matrix has a rank of 4 and it can invert it:

qr(badX)$rank
## [1] 4

qr.solve(badX)
##             [,1] [,2]          [,3]          [,4]
## [1,] -6090479645    0  2.197085e+10  7.366741e+10
## [2,]           0   -2  0.000000e+00  0.000000e+00
## [3,]           0    0 -3.265128e+16  3.353179e+16
## [4,]           0    0  0.000000e+00 -3.266284e+17

This is a pretty ugly result. I have tried varying the tol argument, with no change in results.

For context, the origin of this result is this contrast matrix:

badL <-
structure(c(0, 0, 0, 0, 0, -9.89189274870351e-11, 0, -5.55111512312578e-17, 
    -2.77555756156289e-17, 1.11022302462516e-16, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -0.25, 0, 0, 0, 0, -0.25, 0, 0, 
    0, 9.89189274870351e-11, 0, 5.55111512312578e-17, 2.77555756156289e-17, 
    -1.11022302462516e-16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, -4.23939184015843e-11, 0, -4.16333634234434e-17, -1.38777878078145e-17, 
    5.55111512312578e-17, 0, 0, 0, 0, 0, -4.23939184015843e-11, 0, 
    -4.16333634234434e-17, -1.38777878078145e-17, 5.55111512312578e-17, 
    0, 0, 0, 0, 0, 0, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.25, 0, 0, 
    0, 0, 0, 0, 0, 0, 4.23939184015843e-11, 0, 4.16333634234434e-17, 
    1.38777878078145e-17, -5.55111512312578e-17, 0, 0, 0, 0, 0, -1.41313127284714e-11, 
    0, -6.93889390390723e-18, -6.93889390390723e-18, 1.38777878078145e-17, 
    4.23939184015843e-11, 0, 4.16333634234434e-17, 1.38777878078145e-17, 
    -5.55111512312578e-17, 0, 0, 0, 0, 0), dim = c(5L, 24L), dimnames = list(
    NULL, c("(Intercept)", "A2", "A3", "B2", "B3", "C2", "C3", 
    "A2:B2", "A3:B2", "A2:B3", "A3:B3", "A2:C2", "A3:C2", "A2:C3", 
    "A3:C3", "B2:C2", "B3:C2", "B2:C3", "B3:C3", "A2:B2:C2", 
    "A3:B2:C2", "A3:B3:C2", "A2:B2:C3", "A3:B2:C3")))

... from which I obtained the QR decomposition of its transpose, to find that it is supposedly of rank 4:

badQR <- qr(t(badL))
badQR$rank
## [1] 4

The above matrix badX is equal to qr.R(badQR)[1:4, 1:4] which based on the rank calculation, was supposed to be a full-rank upper-triangular matrix.

My remedy seems to be to use zapsmall() so that I get the rank right...

qr(zapsmall(t(badL)))$rank
## [1] 1

My question is, why does this happen? If you look at badL, it's pretty clear that it has three zero rows and only the second row is nonzero. I would have thought that qr()'s pivoting methods would work better with this. Is there a better way to get more reliable code?

I am running Windows 11 Pro, version 10.0.22000 build 22000. Here's my R system information.

R.Version()
## $platform
## [1] "x86_64-w64-mingw32"
## 
## $arch
## [1] "x86_64"
## 
## $os
## [1] "mingw32"
## 
## $crt
## [1] "ucrt"
## 
## $system
## [1] "x86_64, mingw32"
## 
## $status
## [1] ""
## 
## $major
## [1] "4"
## 
## $minor
## [1] "2.0"
## 
## $year
## [1] "2022"
## 
## $month
## [1] "04"
## 
## $day
## [1] "22"
## 
## $`svn rev`
## [1] "82229"
## 
## $language
## [1] "R"
## 
## $version.string
## [1] "R version 4.2.0 (2022-04-22 ucrt)"
## 
## $nickname
## [1] "Vigorous Calisthenics"

Created on 2022-06-21 by the reprex package (v2.0.1)

More on context

This question came up because I was trying to produce results like this (for a simpler example) in the emmeans package:

> (jt = joint_tests(warpx.emm))
 model term   df1 df2 F.ratio p.value note
 tension        1  37   5.741  0.0217    e
 wool:tension   1  37   5.867  0.0204    e
 (confounded)   2  37   7.008  0.0026  d e

d: df1 reduced due to linear dependence 
e: df1 reduced due to non-estimability

... and in particular, the (confounded) part. This example is with a two-factor model with wool at 2 levels and tension at 3 levels; however, one of the factor combinations is omitted in the data, which means that we can estimate only 1 d.f. for each of the tension main effect and the wool:tension interaction effect, and no main effect at all for wool. There being 4 d.f. for all possible contrasts of the 5 nonempty cells, there are 2 d.f. left over, and those are in the confounded) part.

The computation is based on the associated estimable functions:

> attr(jt, "est.fcns")
$tension
     (Intercept) woolB tensionM tensionH woolB:tensionM woolB:tensionH
[1,]           0     0        1        0            0.5              0

$`wool:tension`
     (Intercept) woolB tensionM tensionH woolB:tensionM woolB:tensionH
[1,]           0     0        0        0              1              0

$`(confounded)`
     (Intercept) woolB tensionM tensionH woolB:tensionM woolB:tensionH
[1,]           0    -1        0        0              0              0
[2,]           0     1        0        0              0              0
[3,]           0    -1        0        0              0              0
[4,]           0    -1        0        1              0              0

... and on the contrasts among all cells in the design:

> contrast(warpx.emm, "consec")@linfct
     (Intercept) woolB tensionM tensionH woolB:tensionM woolB:tensionH
[1,]           0     1        0        0              0              0
[2,]           0    -1        1        0              0              0
[3,]           0     1        0        0              1              0
[4,]           0    -1       -1        1             -1              0
[5,]           0     1        0        0              0              1

The method I use is to combine the estimable functions for tension and wool:tension, and obtain the QR decomposition of its transpose. Then I use qr.resid() with that and the transpose of the above cell contrasts. That leaves us (after transposing yet again) with the estimable functions shown for (confounded). That matrix has 4 rows, but its rank is only 2, as determined by the QR decomposition of this result; then I extract the 2x2 portion of the R part to complete the computation of the F statistic.

The example at the beginning of this question is similar, but with a larger, more complex model; the badL matrix is the result of the qr.resid() procedure described above. In this context, some of those rows arguably should be zero. My workaround at present is to examine the diagonal elements of R (badR in the OP) and select those that exceed an absolute threshold.

The essential idea here is that I need to decompose that matrix of all contrasts into two parts -- the known estimable functions and the leftovers. And an interesting aspect of this is that the rank of the latter part is known, a fact that I have not taken advantage of. In future development, it may well be, per @duffymo, to use a SVD rather than these gyrations with qr.resid(). There's always new stuff to learn...


Solution

  • You are complaining that solve can not invert a matrix that seems to be full rank (according to qr). And you think that solve is doing the correct thing, while qr is not.

    Well, do not trust solve. It is not a robust numerical procedure and we can fool it easily. Here is a diagonal matrix. It is certainly invertible (by simply inverting its diagonal elements), but solve just can't do it.

    D <- diag(c(1, 1e-20))
    #     [,1]  [,2]
    #[1,]    1 0e+00
    #[2,]    0 1e-20
    
    solve(D)
    #Error in solve.default(D) : 
    #  system is computationally singular: reciprocal condition number = 1e-20
    
    Dinv <- diag(c(1, 1e+20))
    
    ## an identity matrix, as expected
    D %*% Dinv
    #     [,1] [,2]
    #[1,]    1    0
    #[2,]    0    1
    
    ## an identity matrix, as expected
    Dinv %*% D
    #     [,1] [,2]
    #[1,]    1    0
    #[2,]    0    1
    

    Now let's look at your badX, which I call R (as it is the upper triangular matrix returned by QR factorization).

    R <-
    structure(c(-1.641906809157e-10, 0, 0, 0, 0, -0.5, 0, 0, -1.10482935525559e-16, 
                0, -3.06266685765538e-17, 0, -4.83736007092039e-17, 0, -3.14414492582296e-18, 
                -3.06158275836099e-18), dim = c(4L, 4L))
    

    solve can not invert it, but qr.solve gives you a proper inverse matrix.

    Rinv <- qr.solve(R)
    
    ## an identity matrix, as expected
    R %*% Rinv
    #     [,1] [,2] [,3]         [,4]
    #[1,]    1    0    0 1.776357e-15
    #[2,]    0    1    0 0.000000e+00
    #[3,]    0    0    1 0.000000e+00
    #[4,]    0    0    0 1.000000e+00
    
    ## an identity matrix, as expected
    Rinv %*% R
    #     [,1] [,2] [,3]         [,4]
    #[1,]    1    0    0 5.293956e-23
    #[2,]    0    1    0 0.000000e+00
    #[3,]    0    0    1 1.387779e-17
    #[4,]    0    0    0 1.000000e+00
    

    QR factorization is numerically stable, by being less sensitive to the scale (or size, magnitude) of different columns. (Other factorizations like LU (on which solve is based) and SVD do.) By definition, this factorization does

    X = Q R

    If we re-scale X's columns by right multiplying a full-rank diagonal matrix D, the QR factorization does not change.

    X D = Q R D

    So let's look at your big matrix t(badL) to which you apply the QR factorization. I call it X.

    X <- structure(c(0, -9.89189274870351e-11, 0, 0, 0, 0, 0, 9.89189274870351e-11, 
    0, 0, 0, -4.23939184015843e-11, 0, -4.23939184015843e-11, 0, 
    0, 0, 0, 0, 4.23939184015843e-11, 0, -1.41313127284714e-11, 4.23939184015843e-11, 
    0, 0, 0, 0, 0, 0, -0.25, -0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0.25, 
    0, 0.25, 0, 0, 0, 0, 0, 0, 0, -5.55111512312578e-17, 0, 0, 0, 
    0, 0, 5.55111512312578e-17, 0, 0, 0, -4.16333634234434e-17, 0, 
    -4.16333634234434e-17, 0, 0, 0, 0, 0, 4.16333634234434e-17, 0, 
    -6.93889390390723e-18, 4.16333634234434e-17, 0, 0, -2.77555756156289e-17, 
    0, 0, 0, 0, 0, 2.77555756156289e-17, 0, 0, 0, -1.38777878078145e-17, 
    0, -1.38777878078145e-17, 0, 0, 0, 0, 0, 1.38777878078145e-17, 
    0, -6.93889390390723e-18, 1.38777878078145e-17, 0, 0, 1.11022302462516e-16, 
    0, 0, 0, 0, 0, -1.11022302462516e-16, 0, 0, 0, 5.55111512312578e-17, 
    0, 5.55111512312578e-17, 0, 0, 0, 0, 0, -5.55111512312578e-17, 
    0, 1.38777878078145e-17, -5.55111512312578e-17, 0), dim = c(24L, 
    5L))
    
    #               [,1]  [,2]          [,3]          [,4]          [,5]
    # [1,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
    # [2,] -9.891893e-11  0.00 -5.551115e-17 -2.775558e-17  1.110223e-16
    # [3,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
    # [4,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
    # [5,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
    # [6,]  0.000000e+00 -0.25  0.000000e+00  0.000000e+00  0.000000e+00
    # [7,]  0.000000e+00 -0.25  0.000000e+00  0.000000e+00  0.000000e+00
    # [8,]  9.891893e-11  0.00  5.551115e-17  2.775558e-17 -1.110223e-16
    # [9,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
    #[10,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
    #[11,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
    #[12,] -4.239392e-11  0.00 -4.163336e-17 -1.387779e-17  5.551115e-17
    #[13,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
    #[14,] -4.239392e-11  0.00 -4.163336e-17 -1.387779e-17  5.551115e-17
    #[15,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
    #[16,]  0.000000e+00  0.25  0.000000e+00  0.000000e+00  0.000000e+00
    #[17,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
    #[18,]  0.000000e+00  0.25  0.000000e+00  0.000000e+00  0.000000e+00
    #[19,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
    #[20,]  4.239392e-11  0.00  4.163336e-17  1.387779e-17 -5.551115e-17
    #[21,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
    #[22,] -1.413131e-11  0.00 -6.938894e-18 -6.938894e-18  1.387779e-17
    #[23,]  4.239392e-11  0.00  4.163336e-17  1.387779e-17 -5.551115e-17
    #[24,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
    

    Let's re-scale its columns so that every column has Euclidean norm (L2 norm, 2-norm) 1.

    norm2 <- sqrt(colSums(X ^ 2))
    
    XD <- X * rep(1 / norm2, each = nrow(X))
    
    #             [,1] [,2]        [,3]       [,4]        [,5]
    # [1,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
    # [2,] -0.60246371  0.0 -0.48418203 -0.5714286  0.57585260
    # [3,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
    # [4,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
    # [5,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
    # [6,]  0.00000000 -0.5  0.00000000  0.0000000  0.00000000
    # [7,]  0.00000000 -0.5  0.00000000  0.0000000  0.00000000
    # [8,]  0.60246371  0.0  0.48418203  0.5714286 -0.57585260
    # [9,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
    #[10,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
    #[11,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
    #[12,] -0.25819930  0.0 -0.36313652 -0.2857143  0.28792630
    #[13,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
    #[14,] -0.25819930  0.0 -0.36313652 -0.2857143  0.28792630
    #[15,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
    #[16,]  0.00000000  0.5  0.00000000  0.0000000  0.00000000
    #[17,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
    #[18,]  0.00000000  0.5  0.00000000  0.0000000  0.00000000
    #[19,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
    #[20,]  0.25819930  0.0  0.36313652  0.2857143 -0.28792630
    #[21,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
    #[22,] -0.08606647  0.0 -0.06052275 -0.1428571  0.07198158
    #[23,]  0.25819930  0.0  0.36313652  0.2857143 -0.28792630
    #[24,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
    

    What do you think now? Is it still a matrix with only one nonzero column? Although qr(X) does not actually first re-scale all columns before QR factorization, looking at XD does help you better understand why QR factorization is more robust.

    If you do want to intervene, do not use zapsmall; threshold columns by their 2-norm, instead.

    X0 <- X
    X0[, norm2 < max(norm2) * sqrt(.Machine$double.eps)] <- 0
    QR0 <- qr(X0)
    
    QR0$rank
    # [1] 1
    

    How do we know that sqrt(.Machine$double.eps) is an appropriate threshold?

    Any threshold between sqrt(.Machine$double.eps) (about 1e-8) and .Machine$double.eps (about 1e-16) is reasonable. Using .Machine$double.eps recovers the regular QR result, giving you rank 4.

    The "sqrt" threshold comes from the situation where we want to look at X'X, which squares the condition number of X.