[SOLVED] Getting incorrect result with cusolverDnDpotrfBatched

Getting incorrect result with cusolverDnDpotrfBatched

I want to find the Cholesky decomposition of a 3x3 matrix using cusolverDnDpotrfBatched, but I am not getting the zeros that should be present in a lower triangular matrix. Here is the matrix for which I want to compute the cholesky decomposition [1 2 3; 2 5 5; 3 5 12]. Is it supposed to be this way? What am I missing? I am aware of this post Cholesky decomposition with CUDA .Here is my code:

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <cuda_runtime.h>
#include <cusolverDn.h>
#include <iostream>

void printMatrix(int m, int n, const double*A, int lda, const char* name)
{
for(int row = 0 ; row < m ; row++){
    for(int col = 0 ; col < n ; col++){
        double Areg = A[row + col*lda];
        printf("%s(%d,%d) = %f\n", name, row+1, col+1, Areg);
    }
}
}

int main(){
cusolverDnHandle_t handle = NULL;
cusolverDnCreate(&handle);

const cublasFillMode_t uplo = CUBLAS_FILL_MODE_LOWER;
const int batchSize = 1;
//const int nrhs = 1;
const int m = 3;
const int lda = m;
//const int ldb = m;
double A0[lda*m] = { 1.0, 2.0, 3.0, 2.0, 5.0, 5.0, 3.0, 5.0, 12.0 };
int infoArray[batchSize]; /* host copy of error info */

double L0[lda*m]; /* cholesky factor of A0 */

double *Aarray[batchSize];
//double *Barray[batchSize];

double **d_Aarray = NULL;
int *d_infoArray = NULL;
for(int j = 0 ; j < batchSize ; j++){
    cudaMalloc ((void**)&Aarray[j], sizeof(double) * lda * m);
    
}
cudaMalloc ((void**)&d_infoArray, sizeof(int)*batchSize);
//assert(cudaSuccess == cudaStat1);
cudaMalloc ((void**)&d_Aarray, sizeof(double*) * batchSize);
cudaMemcpy(Aarray[0], A0, sizeof(double) * lda * m, cudaMemcpyHostToDevice);
cudaMemcpy(d_Aarray, Aarray, sizeof(double*)*batchSize, cudaMemcpyHostToDevice);
cusolverDnDpotrfBatched( handle,uplo,m,d_Aarray,lda,d_infoArray, batchSize);
cudaDeviceSynchronize();
cudaMemcpy(infoArray, d_infoArray, sizeof(int)*batchSize, cudaMemcpyDeviceToHost);
cudaMemcpy(L0, Aarray[0], sizeof(double) * lda * m, cudaMemcpyDeviceToHost);

for(int i =0; i<9;i++)std::cout<<L0[i]<<std::endl;
//printMatrix(m, m, L0, lda, "L0");
//printf("=====\n");
}

Solution

I am not getting the zeros that should be present in a lower triangular matrix.

Perhaps you may wish to read the documentation:

If input parameter uplo is CUBLAS_FILL_MODE_LOWER, only lower triangular part of A is processed, and replaced by lower triangular Cholesky factor L.

Remark: the other part of A is used as a workspace. For example, if uplo is CUBLAS_FILL_MODE_UPPER, upper triangle of A contains cholesky factor U and lower triangle of A is destroyed after potrfBatched.

So there is no expectation that the other part of the matrix will have zeroes.