I am trying to learn graphx on from this code click here in GitHub
On the spark-shell, when I try this:
def parseFlight(str: String): Flight = {
val line = str.split(",")
Flight(line(0), line(1), line(2), line(3), line(4).toInt, line(5).toLong, line(6), line(7).toLong, line(8), line(9).toDouble, line(10).toDouble, line(11).toDouble, line(12).toDouble, line(13).toDouble, line(14).toDouble, line(15).toDouble, line(16).toInt)
}
val textRDD = sc.textFile("/user/user01/data/rita2014jan.csv")
val flightsRDD = textRDD.map(parseFlight).cache()
val airports = flightsRDD.map(flight => (flight.org_id, flight.origin)).distinct
airports.take(1)
I get this exception which points at airports.take(1)
:
java.lang.NumberFormatException: empty String
Can anyone let me know if I'm missing something?
It most probably comes from a row within your input in which a field you're casting to Double is empty.
The error most likely comes from this function (applied at the begining of the spark pipeline):
def parseFlight(str: String): Flight = {
val line = str.split(",")
Flight(line(0), line(1), line(2), line(3), line(4).toInt, line(5).toLong, line(6), line(7).toLong, line(8), line(9).toDouble, line(10).toDouble, line(11).toDouble, line(12).toDouble, line(13).toDouble, line(14).toDouble, line(15).toDouble, line(16).toInt)
}
At some point a cast .toDouble
is applied on ""
(an empty String).
For instance you can reproduce the same error by doing this:
"aa,,,s".split(",")(2).toDouble
which produces:
java.lang.NumberFormatException: empty String
The error led you to think it comes from the line which contains airports.take(1)
because it's the line which contains the first action
of your pipeline (this is where the laziness of the RDD is lost - as opposed to transformations such as map).