javaword2vecdeeplearning4j

Ask About The "default" Size of Vocabulary in Word2Vec in Deeplearning4j Library


I am currently learning about this library: Word2Vec from Deeplearning4j (Homepage, Github)

Following is the example usage of the method:

//build Word2Vec model
Word2Vec vec = new Word2Vec.Builder()
                .layerSize(100)
                .windowSize(5)
                .stopWords(stopList)
                .tokenizerFactory(t)
                .learningRate(0.025)
                .build();

I know that I can limit the vocabulary size with this method:

vec.limitVocabularySize(100) //limit the vocab size as 100

Above example is the command if I want to limit the vocab size into 100

My question:
Could anyone inform me what is the default size of the vocab (i.e., if I do not set the limit)?

Best,


Solution

  • By default there is no limit. That means it will add all words it finds to the vocabulary.

    Also note, the examples you linked to are over 4 years old. I suggest you use the official examples: https://github.com/eclipse/deeplearning4j-examples