I am given a text file (call it text.txt
). I need to count the total number of words (counting repetitions as well). My code begins like this:
def words():
f = sc.textFile("text.txt")
return f.DO_SOME_MAGIC()
So my question reduces to: What should go to DO_SOME_MAGIC
?
For the following text file:
hello world
bye world
I should receive 4
and NOT:
(hello, 1)
(bye, 1)
(world, 2)
Try this will work fine
def words():
f = sc.textFile("text.txt")
return f.flatMap(lambda line: line.split()).count()
without repetition
def words():
f = sc.textFile("text.txt")
return f.flatMap(lambda line: line.split()).distinct().count()