apache-sparkspark-streamingdstream

sortByKey is not working on Dstream


I am using Transform API of Dstream(Spark Streaming) to sort the data. I am reading from TCP socket using netcat. Following the line of code used: myDStream.transform(rdd=>rdd.sortByKey())

It is unable to find function sortByKey. Could anyone please help what is the issue in this step?


Solution

  • If you use netcat as an input, you're likely to use socketTextStream which returns ReceiverInputDStream[String]. In that case transform will take a function:

    (RDD[String]) => RDD[U]
    

    Only RDD[(T, U)], where T has corresponding Orderign can be sortedByKey. For other RDD you can use sortBy:

    myDSTream.transform(rdd => rdd.sortBy(x => x))