Reading a sequence file with Int and String logically,
then if I do this:
val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.map{case (x, y) => (x.toString(), y.toString().split("/")(0), y.toString().split("/")(1))}
.collect
this is ok as the IntWritable is converted to String.
If I do this:
val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.map{case (x, y) => (x, y.toString().split("/")(0), y.toString().split("/")(1))}
.collect
then I get this error immediately:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5.0 in stage 42.0 (TID 692) had a not serializable result: org.apache.hadoop.io.IntWritable
Underlying reason is not really clear - serialization, but why so difficult? This is another type of serialization aspect I note. Also it is only noted at run-time.
If the goal is to just get an Integer value, you would need to call a get on the writable
.map{case (x, y) => (x.get()
And then the JVM handles serialization of the Integer object rather than not knowing how to process a IntWritable because it doesn't implement the Serializable interface