pythonpysparkgraphframes

Update vertices values in GraphFrame


I wonder is there any way to update vertices (or edges) values after constructing a graph with GraphFrame? I have a graph and its vertices have these ['id', 'name', 'age'] columns. I've written a code that creates vertices with new ages and it works perfectly fine. However, when I want to assign these new vertices to older graph's vertices I get the can't set attribute error.

from graphframes import GraphFrame
import pyspark.sql.functions as F

# Vertice DataFrame
v = spark.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
], ["id", "name", "age"])

# Edge DataFrame
e = spark.createDataFrame([
  ("a", "b", "friend"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
], ["src", "dst", "relationship"])

# Create a GraphFrame
g = GraphFrame(v, e)

# Update Vertices
updated_vertices = (
    g.vertices
    .withColumn('new_age', F.lit(10))
    .select(
        'id',
        'name',
        F.col('new_age').alias('age')
    )
)

# Set new vertices
g.vertices = updated_vertices

AttributeError: can't set attribute

Should I reconstruct a new graph object? Or is there a better way to do this?

Thank you.


Solution

  • You would have to create a new graph object in order to update. However, as the graphframe object is only two dataframes you can update like

    g = GraphFrame(updated_vertices, e)
    

    so keep the same name