nlpnetworkxword2vecgrapheme-cluster

Can word2vec deal with sequence of number?


I am very new to network embedding, especially for the attributed network embedding. Currently, I am studying the node2vec algorithm. I think the process is

RandomWalk with p and q
Fed the walks to Word2Vec

For the second step, I see the algorithm takes every node as a string.

But my problem is that the nodes of my network are values. Maybe some nodes have the same value. I think this strategy will take the nodes with the same value as 'one' node.

Then what should I do if I want to embed such a network? My network is an attributed graph, each node has n dimensional attributes.

Thanks so much!


Solution

  • I believe most applications of word2vec to graphs give each node a unique ID, which is then used as the 'word' token fed to the algorithm. If your nodes have other values, that repeat, those values aren't ideal as the node-IDs.

    (While word2vec doesn't natively handle continuous-magnitudes, there has been some research extending it that way – for example, I think Facebook's 'StarSpace' allows mixing scalar features with the discrete tokens of traditional word2vec. I suppose you could also consider banding ranges of your nodes' scalar dimensions into discrete tokens, which could sometimes be used instead of IDs, to learn embeddings for what a range-of-values might be related to.)