neo4jcypherembeddinggraphsage

Convert a node with multiple attributes types (int, float, string) to an embedding using Neo4j and GraphSAGE?


If my nodes look like:

{id: 1, name: "John", last_name: "Doe", age: 40, city: "New York", credit_score: 5.5}
{id: 2, name: "Linda", last_name: "Lumbo", age: 32, city: "Washington", credit_score: 5.5}
{id: 3, name: "Greg", last_name: "Tanta", age: 28, city: "New York", credit_score: 5.5}
{id: 4, name: "Donald", last_name: "Greenboim", age: 64, city: "Tel Aviv", credit_score: 5.5}
{id: 5, name: "Leo", last_name: "Greenhouse", age: 98, city: "Paris", credit_score: 5.5}
{id: 6, name: "John", last_name: "Opelbaum", age: 80, city: "Moscow", credit_score: 1}
{id: 7, name: "John", last_name: "Vein", age: 21, city: "Los Angeles", credit_score: 0.35}
{id: 8, name: "Dino", last_name: "Lodz", age: 34, city: "New York", credit_score: 1.5}
{id: 9, name: "Kurt", last_name: "Kreston", age: 89, city: "New York", credit_score: 5.3}
{id: 10, name: "Alex", last_name: "Mulo", age: 22, city: "Moscow", credit_score: 2.5}
{id: 11, name: "John", last_name: "Tolo", age: 32, city: "Liverpool", credit_score: 0.5}
{id: 12, name: "Trent", last_name: "Benson", age: 57, city: "London", credit_score: 5.114}
{id: 13, name: "Tom", last_name: "Richardson", age: 23, city: "New York", credit_score: 0.986}
....

Consider all are interconnected and I want to apply the GraphSAGE algorithm on the attributes. For some reason I can't get the embeddings when my attributes are strings. Please guide me how can I apply the GraphSAGE algorithm on nodes with string type attributes? Or mixed (float, int, string).

Failed to invoke procedure gds.graph.create: Caused by: java.lang.UnsupportedOperationException: Loading of values of type String is currently not supported


Solution

  • If you want to apply to run GraphSAGE on the string type attributes, you need to apply one hot encoding or some other technique to transform them into a number of a list of numbers. The property type cannot be a mix of various data types, it has to be consistent across all properties. AFAIK, this is valid for any library that includes GraphSAGE, not just Neo4j GDS.

    Probably you can skip the id property as it doesn't bring in any additional information. For the city, name, and last name you can use either one hot encoding or word embeddings to include those properties in GraphSAGE, the decision is yours.