scalasortingapache-sparkubuntuhighest

given <user, movie, rating> how can i use scala to print out the highest rated movie for each user?


input file

(userid,movie,rating)

1,250,3.0

1,20,3.4

1,90,2

2,30,3.5

2,500,2.3

2,20,3.3

I am supposed to to get the highest rated movie the user rated. I am completely lost,I had the program running on hadoop but i am brand new to scala. It is comma delimated.


Solution

  • sc.textFile reads a file line by line as [String] so when you did inputfile.map(x=>(x(0),x(1))) the first and the second characters of each line are used as tuples . And reduceByKey used the first element of the tuple for grouping and the second value, a Char, is sent inside reducyByKey and since the second element is not a tuple but simply a Char, you can't get elements using ._1 and ._2 and thus you had subsequent errors as

    error: value _1 is not a member of Char

    and

    error: value _2 is not a member of Char

    And the last error is obvious

    error: value maxBy is not a member of

    as you can't perform maxBy on Char elements.

    Heres the complete working solution for you

    val inputfile = sc.textFile("/home/mortaza/input/input.txt")
    
    val keyval = inputfile.map(x=>x.split(",")).map(x => (x(0), (x(1), x(2)))).reduceByKey{case (x, y) => if (x._2 <= y._2) y else x}
    
    keyval.map(x => Seq(x._1, x._2._1, x._2._2).mkString(",")).saveAsTextFile("/home/mortaza/out/wordfreq")
    

    which should generate a csv output with following output (the input used is as given in the question)

    2,30,3.5
    1,20,3.4
    

    I hope the answer is helpful