[SOLVED] All eigenvectors of large sparse matrices are zero

The issue regarding numpy.linalg.eig() and scipy.linalg.eig() may be a memory error related to the size of the matrix: a 50000x50000 double precision dense matrix occupies 18Go of memory.

numpy.linalg.eig() and scipy.linalg.eig() rely on the routine DGEEV() of LAPACK. LAPACK DGEEV() and DGEEVX() makes use of the QR algorithm to compute all eigenvalues and eigenvectors of a dense matrix. First, the matrix is reduced to to upper Hessenberg form using dgehrd(), then QR iterations are performed in dhseqr(). In DGEEVX(), the matrix is first balanced and scaled.

On the other hand,scipy.sparse.linalg.eigs() and scipy.sparse.linalg.eigsh() rely on ARPACK functions implementing the Implicitly Restarted Arnoldi Method and Implicitly Restarted Lanczos Method on sparse matrix. Both are improvements of the power method and are very efficient at computing the largest eigenvalues/eigenvectors with improved accuracy. If Ax=b can be rapidly solved, these iterative methods are also very efficent at finding the smallest eigenvalues/eigenvectors, or eigenvalues/eigenvectors near a given value.

The difference between these methods is explained in Lloyd N. Trefethen and David Bau, NUMERICAL LINEAR ALGEBRA, Lecture 33. The Arnoldi Iteration.

... whereas the Arnoldi iteration is based upon the QR factorization (33.7) of the matrix ,whose columns are b, A b, ... ,A^{n-1} b, simultaneous iteration and the QR algorithm are based upon the QR factorization (28.16) of the matrix whose columns are A^n e_1 ... , A^n e_n .

From a practical viewpoint, the Arnoldi iteration is always applied to retreive a limited number of eigenvectors significantly lower than the size of the matrix. Nevertheless, the argument sigma of scipy.sparse.linalg.eigs() or scipy.sparse.linalg.eigsh() enables finding interior eigenvalues and eigenvectors near sigma. Hence, it is possible to call scipy.sparse.linalg.eigsh() multiple times using different sigma. If the eigenvalues all feature a limited multiplicity, all eigenvectors can be recovered. Potentiel duplicates are to be avoided by separating eigenvalues and tracking their multiplicity.

The basic call using sigma writes:

sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M.copy(), k = kkeig,sigma=sig, which='LM', mode='normal')

If the matrix is Hermitian semi Here is a sample code, based on your previous deleted post, that computes all eigenvalues of a positive semi-definite sparse matrix. As the eigenvalues are all real and positive, sigma is progressively increased to find all eigenvalues:

import numpy as np
import networkx as nx
import scipy as sp




def tolist(sparsevalue, keeplast=False):
    listofeigval=[]
    listofmultiplicity=[]

    curreig=sparsevalue[0]-1.1*np.abs(sparsevalue[0])
    for i in range(len(sparsevalue)):
           #print curreig, sparsevalue[i], len(listofeigval)
           if np.abs(curreig-sparsevalue[i])<1e-11*(np.abs(curreig)+np.abs(sparsevalue[i])):
                listofmultiplicity[-1]=listofmultiplicity[-1]+1
           else :
                curreig=sparsevalue[i]
                listofeigval.append(curreig)
                listofmultiplicity.append(1)

    if keeplast==False:
        #the last one is not sure regarding multiplicity:
        listofeigval.pop()
        listofmultiplicity.pop()

    return listofeigval,listofmultiplicity

def test():
    N_1 = 2000
    R_1 = 0.1
    k = 0
    iterations = 1
    while k < iterations:
      G = nx.random_geometric_graph(N_1, R_1)
      if nx.is_connected(G) == True:
          print 'got connected network'
          k = k+1
          M=nx.laplacian_matrix(G)      #M is here a sparse matrix
          M = M.astype(float)
          #M[0,0]=M[0,0]+1.  # this makes the laplacian_matrix positive definite.
          #sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M, k = N_1-2)
          kkeig=400
          sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M.copy(), k = kkeig, which='SM')
          print sparsevalue

          listofeigval=[]
          listofmultiplicity=[]
          listofeigval,listofmultiplicity=tolist(sparsevalue)
          print len(listofeigval), len(listofmultiplicity)

          nbeigfound=0
          for mul in listofmultiplicity:
              nbeigfound=nbeigfound+mul
          keepgoing=True
          while( nbeigfound<N_1):
               print '(',nbeigfound,'/',N_1,')  is ', listofeigval[-1]
               meanspacing=0.
               meanspacingnb=0.

               for i in range(10):
                     meanspacingnb=meanspacingnb+listofmultiplicity[len(listofeigval)-i-1]
               meanspacing=(listofeigval[-1]-listofeigval[len(listofeigval)-10])/meanspacingnb
               sig=listofeigval[-1]+0.1*kkeig*meanspacing

               keeplast=False
               if nbeigfound<N_1-0.5*kkeig:
                   sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M.copy(), k = kkeig,sigma=sig, which='LM', mode='normal')
               else:
                   keeplast=True
                   sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M.copy(), k = kkeig, which='LM')
               listofneweigval,listofnewmultiplicity=tolist(sparsevalue,keeplast)
               #need to retreive the starting point
               index=len(listofeigval)-2*kkeig/10
               if listofneweigval[1]>listofeigval[index]:
                   while(np.abs(listofneweigval[1]-listofeigval[index])>1e-10*(np.abs(listofneweigval[1])+np.abs(listofeigval[index]))):
                       index=index+1
               else:
                   while(np.abs(listofneweigval[1]-listofeigval[index])>1e-10*(np.abs(listofneweigval[1])+np.abs(listofeigval[index]))):
                       index=index-1
               #The first matching eigenvalue is found.
               #zipping the list and checking if it works.
               i=1
               while index<len(listofeigval) and i<len(listofneweigval) :
                    if (np.abs(listofneweigval[i]-listofeigval[index])>1e-10*(np.abs(listofneweigval[i])+np.abs(listofeigval[index]))):
                         print 'failed after ', index, ' different eigenvalues', ': wrong eigenvalue'
                         return listofeigval[0:index], listofmultiplicity[0:index], 1
                    if listofmultiplicity[index] != listofnewmultiplicity[i] :
                         print 'failed after ', index, ' different eigenvalues', ': wrong multiplicity'
                         return listofeigval[0:index], listofmultiplicity[0:index], 2
                    index=index+1
                    i=i+1
               #adding the remaining eigenvalues.
               while i<len(listofneweigval) :
                    listofeigval.append(listofneweigval[i])
                    listofmultiplicity.append(listofnewmultiplicity[i])
                    nbeigfound=nbeigfound+listofnewmultiplicity[i]
                    i=i+1
          print 'number of eigenvalues: ', nbeigfound
          nbl=0
          for i in range(len(listofeigval)):
               print 'eigenvalue ',i,' (',nbl,'/',N_1,')  is %14.8f'% listofeigval[i], ' with multiplicity', listofmultiplicity[i]
               nbl=nbl+listofmultiplicity[i]

          return listofeigval, listofmultiplicity, 0


          #sig=39.1
          #sparsevalue, sparsevector = sp.sparse.linalg.eigsh(M.copy(), k = 100,sigma=sig, which='LM', mode='normal')
          #print sparsevalue

test()

For Hermitian matrices featuring real positive and negative eigenvalues, both positive and negative values of sigma are to be explored. If the matrix is not Hermitian, the eigenvalues may not be real and values of sigma on the complex plane are to be chosen. Searching first for the magnitude of the largest eigenvalue of A limits the area to a disk.

The proposed method is very slow and may not always work. It worked once for a 20000x20000 matrix, using 1Go of memory.