vectormachine-learningnlpword2vec

Understanding Word2Vec's Skip-Gram Structure and Output


I have a two-fold question about the Skip-Gram model in Word2Vec:

The way I imagine it is something along the following lines (made-up example):

Assuming the vocabulary ['quick', 'fox', 'jumped', 'lazy', 'dog'] and a context of C=1, and assuming that for the input word 'jumped' I see the two output vectors looking like this:

[0.2 0.6 0.01 0.1 0.09]

[0.2 0.2 0.01 0.16 0.43]

I would interpret this as 'fox' being the most likely word to show up before 'jumped' (p=0.6), and 'dog' being the most likely to show up after it (p=0.43).

Do I have this right? Or am I completely off?


Solution

  • Your understanding in both parts seem to be correct, according to this paper :

    http://arxiv.org/abs/1411.2738

    The paper explains word2vec in detail and at the same time, keeps it very simple - it's worth a read for a thorough understanding of the neural net architecture used in word2vec.

    Referring to the example you mentioned, with C=1 and with a vocabulary of ['quick', 'fox', 'jumped', 'lazy', 'dog'].

    If the output from the skip-gram is [0.2 0.6 0.01 0.1 0.09], where the correct target word is 'fox' then error is calculated as:

    [0 1 0 0 0] - [0.2 0.6 0.01 0.1 0.09] = [-0.2 0.4 -0.01 -0.1 -0.09]
    

    and the weight matrices are updated to minimize this error.