I am trying to retrieve embeddings for words based on the pretrained ELMo model available on tensorflow hub. The code I am using is modified from here: https://www.geeksforgeeks.org/overview-of-word-embedding-using-embeddings-from-language-models-elmo/
The sentence that I am inputting is
bod =" is coming up in and every project is expected to do a video due on we look forward to discussing this with you at our meeting this this time they have laid out the selection criteria for the video award s go for the top spot this time "
and these are the keywords I want embeddings for:
words=["do", "a", "video"]
embeddings = elmo([bod],
signature="default",
as_dict=True)["elmo"]
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
this sentence is 236 characters in length. this is the picture showing that
but when I put this sentence into the ELMo model, the tensor that is returned is only contains a string of length 48
and this becomes a problem when i try to extract embeddings for keywords that are outside the 48 length limit because the indices of the keywords are shown to be outside this length:
this is the code I used to get the indices for the words in 'bod'(as shown above)
num_list=[]
for item in words:
print(item)
index = bod.index(item)
num_list.append(index)
num_list
But i keep running into this error:
I tried looking for ELMo documentation to explain why this is happening but I have not found anything related to this problem of pruned input.
Any advice is much appreciated!
Thank You
This is not really an AllenNLP issue since you are using a tensorflow-based implementation of ELMo.
That said, I think the problem is that ELMo embeds tokens, not characters. You are getting 48 embeddings because the string has 48 tokens.