input file
(userid,movie,rating)
1,250,3.0
1,20,3.4
1,90,2
2,30,3.5
2,500,2.3
2,20,3.3
I am supposed to to get the highest rated movie the user rated. I am completely lost,I had the program running on hadoop but i am brand new to scala. It is comma delimated.
so far i have gotten here but i cant parse the line because correctly.
val inputfile = sc.textFile("/home/input/input.txt")
val keyval = inputfile.map(x=>(x(0),x(1)))
.reduceByKey{case (x, y) => (x._1+y._1, math.max(x._2,y._2))}
keyval.maxBy { case (key, value) => value }
keyval.saveAsTextFile("/home/out/word")
I get these errors -
<console>:26: error: value _1 is not a member of Char
keyval.reduceByKey{case (x, y) => (x._1+y._1,
math.max(x._2,y._2))}
^
<console>:26: error: value _1 is not a member of Char
keyval.reduceByKey{case (x, y) => (x._1+y._1,math.max(x._2,y._2))}
^
<console>:26: error: value _2 is not a member of Char
keyval.reduceByKey{case (x, y) => (x._1+y._1,math.max(x._2,y._2))}
^
<console>:26: error: value _2 is not a member of Char
keyval.reduceByKey{case (x, y) => (x._1+y._1,math.max(x._2,y._2))}
^
<console>:26: error: value maxBy is not a member of
org.apache.spark.rdd.RDD[(Char, Char)]
keyval.maxBy { case (key, value) => value }
sc.textFile
reads a file line by line as [String] so when you did inputfile.map(x=>(x(0),x(1)))
the first and the second characters of each line are used as tuples . And reduceByKey
used the first element of the tuple for grouping and the second value, a Char
, is sent inside reducyByKey
and since the second element is not a tuple but simply a Char
, you can't get elements using ._1
and ._2
and thus you had subsequent errors as
error: value _1 is not a member of Char
and
error: value _2 is not a member of Char
And the last error is obvious
error: value maxBy is not a member of
as you can't perform maxBy on Char elements.
Heres the complete working solution for you
val inputfile = sc.textFile("/home/mortaza/input/input.txt")
val keyval = inputfile.map(x=>x.split(",")).map(x => (x(0), (x(1), x(2)))).reduceByKey{case (x, y) => if (x._2 <= y._2) y else x}
keyval.map(x => Seq(x._1, x._2._1, x._2._2).mkString(",")).saveAsTextFile("/home/mortaza/out/wordfreq")
which should generate a csv output with following output (the input used is as given in the question)
2,30,3.5
1,20,3.4
I hope the answer is helpful