nlpflair

is there anyway to get the actual vector embedding of a word or set of characters using flair nlp? i.e flair embeddings


Basically I'm trying to use a custom flair language model to get a word or sentence's embedding in a vector. Is this possible or do flair embeddings only function when using flair NER models?

When using the embeddings .embed() function I receive an output like "[Sentence: "pain" [− Tokens: 1]]" where as I'm looking for the vector of continuous numbers.

Thank you.


Solution

  • I'm quite confused because there is an official tutorial on word embeddings by the flair authors themselves, which seems to cover exactly this topic. I guess the problem is that you are confusing the processed sentence object from .embed() with the actual .embedding property of said object.

    In any case, you can simply iterate over the word embeddings of individual tokens like so (taken from the tutorial mentioned above):

    from flair.embeddings import WordEmbeddings
    from flair.data import Sentence
    
    # init embedding
    glove_embedding = WordEmbeddings('glove')
    
    # create sentence.
    sentence = Sentence('The grass is green .')
    
    # embed a sentence using glove.
    glove_embedding.embed(sentence)
    
    # now check out the embedded tokens.
    for token in sentence:
        print(token)
        print(token.embedding)
    

    I am not familiar enough with flair to know whether you can apply it to arbitrary character sequences, but it worked for tokens for me.