rtext2vec

error running glmnet on 2 combined DTMs (via cBind) in text2vec


I created a tf-idf DTM and a n-gram based DTM in text2vec, using the same dataset. now, i am able to run glmnet on each of them separately, but when i combine these 2 DTMs to via cBind, glmnet gives me an error:

Error in validObject(.Object) :invalid class “dgCMatrix” object: length(Dimnames[1]) differs from Dim[1] which is 43895

dtm_train_tfidf = (19579 * 27511) matrix, and

dtm_train_ngram = (19579 * 16384) matrix.

which means that they have the same exact number of rows, and i can combine them using cBind (cbind for matrices) and get a large matrix on which i should be able to run glmnet. only i am not able to run it and i get this error. how do i rectify?


Solution

  • This is due to the bug https://github.com/dselivanov/text2vec/issues/205. You can use development version from GitHub or just drop colnames of the dtm from hash vectorizer.