javasimilaritylsajama

Problems using Jama in java for LSA


i am making using of the jama package for finding the lsa . I was told to reduce the dimensionality and hence i have reduced it to 3 in this case and i reconstruct the matrix . But the resultant matrix is very different from the one i had given to the system

heres the code

    a = new Matrix(termdoc); // get the matrix here 
    a = a.transpose() ; // since the matrix is in the form of doc * terms i transpose it 
    SingularValueDecomposition sv =new SingularValueDecomposition(a) ; 
    u = sv.getU();
    v = sv.getV(); 
    s = sv.getS();
    uarray = u.getArray();
    sarray = s.getArray(); 
    varray = v.getArray(); 
    sarray_mod = new double[3][3]; //reducing dimension 
    uarray_mod = new double[uarray.length][3];
    varray_mod = new double[3][varray.length]; 
    move(sarray,3,3,sarray_mod); // my method to move the contents 
    move(uarray,uarray.length,3,uarray_mod); 
    move(varray,3,varray.length,varray_mod); 
    e = new Matrix(uarray_mod); 
    f = new Matrix(sarray_mod);
    g = new Matrix(varray_mod);
    Matrix temp  =e.times(f); 
    result = temp.times(g); 
    result = result.transpose(); 
    results = result.getArray() ; 
    System.out.println(" The array after svd : \n"); 
    print(results);// my method to print the array 

 private static void move(double[][] sarray2, int r, int c,
        double[][] sarrayMod) {
    // TODO Auto-generated method stub 
    for(int i=0;i<r;i++)
        for(int t=0;t<c;t++)
            sarrayMod[i][t]=sarray2[i][t];

}

A sample output with just 3 files of which two are the similar

0.25 0 0 0 0 0 0 0 0.25 0 0.25 0.25 0 

0 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0 0.083 0.083 0.167 0.083 

0.25 0 0 0 0 0 0 0 0.25 0 0.25 0.25 0 

The array after svd :

0.225 0.029 0.029 0.029 0.029 0.029 0.029 0.029 0.225 0.029 0.253 0.282 0.029 

-0.121 0.077 0.077 0.077 0.077 0.077 0.077 0.077 -0.121 0.077 -0.044 0.033 0.077 

0.245 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.245 0.012 0.257 0.269 0.012 

Solution

  • Go through the example Here

    In the example, we take first 2 columns from U,S and V . And then we multiply them. It wont result to give you the same matrix but will enhance the performance in similarity.

    If you have gone through the example, you will find that the similarity between user and human was in -ve. But after we performed SVD , similarity increased to a +ve value close to 1.

    I think the way you are moving is correct. Just go through the example once.