I am a beginner in Machine Learning and trying Document Embedding for a university project. I work with Google Colab and Jupyter Notebook (via Anaconda). The problem is that my code is perfectly running in Google Colab but if i execute the same code in Jupyter Notebook (via Anaconda) I run into an error with the ConcatenatedDoc2Vec Object.
With this function I build the vector features for a Classifier (e.g. Logistic Regression).
def build_vectors(model, length, vector_size):
vector = np.zeros((length, vector_size))
for i in range(0, length):
prefix = 'tag' + '_' + str(i)
vector[i] = model.docvecs[prefix]
return vector
I concatenate two Doc2Vec Models (d2v_dm, d2v_dbow
), both are working perfectly trough the whole code and have no problems with the function build_vectors()
:
d2v_combined = ConcatenatedDoc2Vec([d2v_dm, d2v_dbow])
But if I run the function build_vectors()
with the concatenated model:
#Compute combined Vector size
d2v_combined_vector_size = d2v_dm.vector_size + d2v_dbow.vector_size
d2v_combined_vec= build_vectors(d2v_combined, len(X_tagged), d2v_combined_vector_size)
I receive this error (but only if I run this in Jupyter Notebook (via Anaconda) -> no problem with this code in the Notebook in Google Colab):
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [20], in <cell line: 4>()
1 #Compute combined Vector size
2 d2v_combined_vector_size = d2v_dm.vector_size + d2v_dbow.vector_size
----> 4 d2v_combined_vec= build_vectors(d2v_combined, len(X_tagged), d2v_combined_vector_size)
Input In [11], in build_vectors(model, length, vector_size)
3 for i in range(0, length):
4 prefix = 'tag' + '_' + str(i)
----> 5 vector[i] = model.docvecs[prefix]
6 return vector
AttributeError: 'ConcatenatedDoc2Vec' object has no attribute 'docvecs'
Since this is mysterious (for me) -> Working in Google Colab but not Anaconda and Juypter Notebook -> and I did not find anything to solve my problem in the web.
If it's working one place, but not the other, you're probably using different versions of the relevant libraries – in this case, gensim
.
Does the following show exactly the same version in both places?
import gensim
print(gensim.__version__)
If not, the most immediate workaround would be to make the place where it doesn't work match the place that it does, by force-installing the same explicit version – pip intall gensim==VERSION
(where VERSION
is the target version) – then ensuring your notebook is restarted to see the change.
Beware, though, that unless starting from a fresh environment, this could introduce other library-version mismatches!
Other things to note:
docvecs
is now called just dv
- so some older code erroring this way may only need docvecs
replaced with dv
to work. (Other tips for migrating older code to the latest Gensim conventions are available at: https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4 )ConcatenatedDoc2Vec
class from. A clas of that name exists in some Gensim demo/test code, as a very minimal shim class that was at one time used in attempts to reproduce the results of the original "Paragaph Vector" (aka Doc2Vec
) paper. But beware: that's not a usual way to use Doc2Vec
, & the class of that name I know barely does anything outside its original narrow purpose.Doc2Vec
just pick one mode.Doc2Vec
code from that one experiment.