javaapache-sparkgraphframes

Getting shortestPaths in GraphFrames with Java


I am new to Spark and GraphFrames.

When I wanted to learn about shortestPaths method in GraphFrame, GraphFrames documentation gave me a sample code in Scala, but not in Java.

In their document, they provided following (Scala code):

import org.graphframes.{examples,GraphFrame}
val g: GraphFrame = examples.Graphs.friends  // get example graph

val results = g.shortestPaths.landmarks(Seq("a", "d")).run()
results.select("id", "distances").show()

and in Java, I tried:

import org.graphframes.GraphFrames;
import scala.collection.Seq;
import scala.collection.JavaConverters;

GraphFrame g = new GraphFrame(...,...);
Seq landmarkSeq = JavaConverters.collectionAsScalaIterableConverter(Arrays.asList((Object)"a",(Object)"d")).asScala().toSeq();
g.shortestPaths().landmarks(landmarkSeq).run().show();

or

g.shortestPaths().landmarks(new ArrayList<Object>(List.of((Object)"a",(Object)"d"))).run().show();

Casting to java.lang.Object was necessary since the API demands Seq<Object> or ArrayList<Object> and I could not pass ArrayList<String> to compile it right.

After running the code, I saw the message:

Exception in thread "main" org.apache.spark.sql.AnalysisException: You're using untyped Scala UDF, which does not have the input type information. Spark may blindly pass null to the Scala closure with primitive-type argument, and the closure will see the default value of the Java type for the null argument, e.g. `udf((x: Int) => x, IntegerType)`, the result is 0 for null input. To get rid of this error, you could:
1. use typed Scala UDF APIs(without return type parameter), e.g. `udf((x: Int) => x)`
2. use Java UDF APIs, e.g. `udf(new UDF1[String, Integer] { override def call(s: String): Integer = s.length() }, IntegerType)`, if input types are all non primitive
3. set spark.sql.legacy.allowUntypedScalaUDF to true and use this API with caution;

To follow the 3., I have added the code:

System.setProperty("spark.sql.legacy.allowUntypedScalaUDF","true");

but situation did not change.

Since there are limited number of sample code or stackoverflow questions about GraphFrames in Java, I could not find any useful information while seeking around.

Could anyone experienced in this area help me solve this problem?


Solution

  • This seems a bug in GraphFrames 0.8.0.

    See Issue #367 in github.com