[SOLVED] How to get the Q from the QR factorization output?

How to get the Q from the QR factorization output?

DGEQRF and SGEQRF from LAPACK return the Q part of the QR factorization in a packed format. Unpacking it seems to require O(k^3) steps (k low-rank products), and doesn't seem to be very straightforward. Plus, the numerical stability of doing k sequential multiplications is unclear to me.

Does LAPACK include a subroutine for unpacking Q, and if not, how should I do it?

Solution

Yes, LAPACK indeed offers a routine to retrieve Q from the elementary reflectors (i.e. the part of data returned by DGEQRF), it is called DORGQR. From the description:

*  DORGQR generates an M-by-N real matrix Q with orthonormal columns,
*  which is defined as the first N columns of a product of K elementary
*  reflectors of order M
*
*        Q  =  H(1) H(2) . . . H(k)
*  as returned by DGEQRF.

A complete calculation of Q and R from A using the C-wrapper LAPACKE could look like this (a Fortran adaption should be straight forward) :

    void qr( double* const _Q, double* const _R, double* const _A, const size_t _m, const size_t _n) {
        // Maximal rank is used by Lapacke
        const size_t rank = std::min(_m, _n); 
        
        // Tmp Array for Lapacke
        const std::unique_ptr<double[]> tau(new double[rank]);
        
        // Calculate QR factorisations
        LAPACKE_dgeqrf(LAPACK_ROW_MAJOR, (int) _m, (int) _n, _A, (int) _n, tau.get());
        
        // Copy the upper triangular Matrix R (rank x _n) into position
        for(size_t row =0; row < rank; ++row) {
            memset(_R+row*_n, 0, row*sizeof(double)); // Set starting zeros
            memcpy(_R+row*_n+row, _A+row*_n+row, (_n-row)*sizeof(double)); // Copy upper triangular part from Lapack result.
        }
        
        // Create orthogonal matrix Q (in tmpA)
        LAPACKE_dorgqr(LAPACK_ROW_MAJOR, (int) _m, (int) rank, (int) rank, _A, (int) _n, tau.get());
        
        //Copy Q (_m x rank) into position
        if(_m == _n) {
            memcpy(_Q, _A, sizeof(double)*(_m*_n));
        } else {
            for(size_t row =0; row < _m; ++row) {
                memcpy(_Q+row*rank, _A+row*_n, sizeof(double)*(rank));
            }
        }
    }

It's a piece of my own code, where I removed all checks to improve readability. For productive use you want to check that the input is valid and also mind the return values of the LAPACK calls. Note that the input A is destroyed -- overwritten with the values of R and values necessary to recreate Q:

* A is DOUBLE PRECISION array, dimension (LDA,N)
*          On entry, the M-by-N matrix A.
*          On exit, the elements on and above the diagonal of the array
*          contain the min(M,N)-by-N upper trapezoidal matrix R
*          (R is upper triangular if M >= N);
*          the elements below the diagonal are used to store part of
*          the data structure to represent Q.

To get R you can extract the upper triangular part (right of the diagonal) of A:

    // Copy over values of R to n x n matrix - for m >= n
    double* r = (double *)calloc(n * n, sizeof(double));
    for (int i = 0; i < n; i++)
    {
      // ROW-MAJOR
      cblas_dcopy(n - i, a + (i * n), 1, r + (i * n), 1);

      // COLUMN-MAJOR
      // cblas_dcopy(i + 1, a + (i * m), 1, r + (i * n), 1);
    }

Instead of cblas_dcopy you could use memcpy.