i rewrite this code:
import org.apache.spark.sql.SparkSession
object SimpleApp {
def main(args: Array[String]) {
val logFile = "file:///root/spark/README.md"
val spark = SparkSession.builder.appName("Simple Application").getOrCreate()
val logData = spark.read.textFile(logFile).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println(s"Lines with a: $numAs, Lines with b: $numBs")
spark.stop()
}
}
to this:
import org.apache.livy._
import org.apache.spark.sql.SparkSession
class Test extends Job[Int]{
override def call(jc: JobContext): Int = {
val spark = jc.sparkSession()
val logFile = "file:///root/spark/README.md"
val logData = spark.read.textFile(logFile).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println(s"Lines with a: $numAs, Lines with b: $numBs")
1 //Return value
}
}
but when compile it with sbt val spark not recognized correctly and i received error "value read is not a member of Nothing"
also after comment spark related code when i try to run resulted jar file using /batches i received error "java.lang.NoSuchMethodException: Test.main([Ljava.lang.String;)"
please any body can show correct spark scala code rewriting way?
There's no need to rewrite your Spark application in order to use Livy. Instead, you can use its REST interface to submit jobs on a cluster that has a running livy server, retrieve logs, get job state, etc.
As a practical example, here are instructions to run your application on AWS.
Setup:
Now you'll be able to issue a POST request using cURL (or any equivalent) to submit your application:
curl -H "Content-Type: application/json" -X POST --data '{"className":"<your-package-name>.SimpleApp","file":"s3://<path-to-your-jar>"}' http://<cluster-domain-name>:8998/batches