rword-embeddingtext2vecglove

Using GLOVEs pretrained glove.6B.50.txt as a basis for word embeddings R


I'm trying to convert textual data into vectors using GLOVE in r. My plan was to average the word vectors of a sentence, but I can't seem to get to the word vectorization stage. I've downloaded the glove.6b.50.txt file and it's parent zip file from: https://nlp.stanford.edu/projects/glove/ and I have visited text2vec's website and tried running through their example where they load wikipedia data. But I dont think its what I'm looking for (or perhaps I am not understanding it). I'm trying to load the pretrained embeddings into a model so that if I have a sentence (say 'I love lamp') I can iterate through that sentence and turn each word into a vector that I can then average (turning unknown words into zeros) with a function like vectorize(word). How do I load the pretrained embeddings into a glove model as my corpus (and is that even what I need to do to accomplish my goal?)


Solution

  • I eventually figured it out. The embeddings matrix is all I needed. It already has the words in their vocab as rownames, so I use those to determine the vector of each word.

    Now I need to figure out how to update those vectors!