rh2osvmlight

Reading svmLigh format with h2o


Using the h2o R package (v 3.24.0.5) for some deeplearning, I need to import some big sparse matrix [2M * 10k] into it. I've tried using fwrite but got a cholmod problem too large error, so went with svmlight. Original matrix looks like this :

    Count    Dist    
1   nan     10.1266
2   859.124 10.8198
3   nan     10.1266

For this I took the sparsio package, writing goes ok but when reading the file with h2o.importFile I noticed something wrong : I get the column indexes in front of every numbers as you can see below :

library(sparsio)
write_svmlight(HiC_mat.All, file="Rdata/mat_kmer-NA.txt")


HIC_df = h2o.importFile("Rdata/mat_kmer-NA.txt")

HIC_df[1:3,1:3]
  C1        C2        C3
1  0     0:nan 1:10.1266
2  0 0:859.124 1:10.8198
3  0     0:nan 1:10.1266

Any idea on how I can get rid of these ?

Data should look like this:

  C1        C2        C3
1  0       nan     10.1266
2  0    859.124    10.8198
3  0       nan     10.1266

Solution

  • Ok so the problem seems to be indeed in the writing of the svm file I used this :

    write_svmlight(x, y = numeric(nrow(x)), file = filename, zero_based = FALSE) 
    

    and it works for now